October 5, 2020 • 40 minutes
In the conclusion of this two part episode, Jeremy chats with Alex Casalboni about strategies to optimize Lambda functions, how to use his AWS Lambda Power Tuning project to find the right balance of cost and performance, and how to combine Lambda with other AWS services to maximize the power of each execution.
Watch this episode on YouTube:
Watch this episode on YouTube: https://youtu.be/m2NB_0J5fms
Jeremy: if we go back to the sort of the having to run this on every function, and you maybe make a change to a function, you do something like that, it just becomes very, very tedious, and probably a lot of work to run this on every single function. But as you start to see... You run it on a few functions, maybe different types of workloads, those patterns start to emerge, right?
Alex: Absolutely. There is not an infinite set of patterns. I have identified about six or seven. Usually, you end up in some of these. There are other patterns where the output is a bit randomic meaning there is either some downstream dependency that is not scaling as well as Lambda is. So you might see some noise in the data. But yeah, usually you end up in one of these categories. I think there is a last category that is a bit spatial where you actually are downloading or uploading a lot of data. I've seen this with S3 maybe you need to download 50 or 100 megabytes of data from S3. I wouldn't recommend you. But if you really have to do that the power tuning implications are very interesting, in my opinion, because if you also change one line of code, and I did this experiment with Python. So, it was pretty easy. I think you can do also the same with Node or Java or other SDKs.
So, if you enable the multi threading options in the SDK especially at a high power like above 1.8 gigabytes you get two cores. And so, you just start downloading or uploading using 10 or 20 threads, you actually see a massive difference there. So, you might see 10% improvement for cheaper cost. So, if that's what you're doing with Lambda, you might consider full power. But again, check the numbers.
Jeremy: Right. And the other thing too is, again, knowing... You mentioned Python, knowing your runtime is important because node is single threaded. So, even if you do go over the 1.8, you do not get a second core because Node doesn't work that way. All right. So, you mentioned something really interesting. I think this is another fascinating thing about pay per use services is Lambda has a 100 millisecond billing model. So, if you run something for 99 milliseconds you pay for 100 milliseconds. If you run something for 101 milliseconds you pay for 200 milliseconds. So, I think an important piece of this, if you are trying to optimize for cost, is also understanding that billing rounding thing, right?
Alex: Yeah, that's true. I've been talking to some development teams. And it's very common that you develop a service application, and you end up as you were saying with 10, or 50, or 100 functions. And one day the manager wakes up and wants to optimize for cost or for performance. And you're like, sure, but where do I start? I have 100 functions. But I think it's also important to know what your functions are doing to detect the right pattern and to know where it makes sense to optimize. There are cases where your team or yourself or your manager may want to only optimize for cost. It's a cost optimization project, whatsoever.
And you might end up optimizing some functions where there is no way that you can actually shave off enough milliseconds to go down one level, 100 millisecond level. So, maybe you're just optimizing for the user experience, which is great. Or maybe it's not a customer facing app, so it doesn't matter. But I think it makes sense to understand how cost and performance are related in serverless. Because sometimes they are aligned to each other, meaning you can optimize for both just in one shot. But you still want to be aware, especially if it's about prioritizing between a large set of functions. Actually, I got that feedback a lot. I think if I see a direction of Lambda Power Tuning evolving into something that would help a development team handle multiple functions I'll build something like a prioritizer error or something that helps you detect those kind of functions more easily or to help you with a batch of functions, for example.
Jeremy: Right. Yeah, no, I think that'd be another very cool project. I think you asked the right question there. And then at least, this is something for me is sort of what are you optimizing for? Are we trying to make the user experience better, so we have lower latencies? Are we trying to get the cost down on the backend, maybe for running ETL tasks and things like that? Those are certainly things where I think this comes in. Really, this is an important thing to consider is to say, "Are we trying to save 10 bucks a month from our front end just so that we're, again, saving $10 a month, but maybe it takes 120 milliseconds for our API to respond. Or are we trying to save potentially thousands of dollars on the backend if we're running these complex ETL tasks. And that brings me to something... So, Joe Emison who runs Branch Insurance, every month he usually post a screenshot of his bill. And running an entire online insurance agency his Lambda bill was $22.65. So is optimizing for cost in that situation something that should even cross your mind?
Alex: Yeah, that's a fair question. It probably isn't. I would still suggest you run Lambda Power Tuning because you might be willing to pay $26, and get a 30% performance improvement. So, it's not like one or the other depending on your scale, depending on the use case, depending on the customer needs you might decide to invest on performance. And with Lambda it's pretty simple. You can visualize it. You have one knob, and it's fairly simple. So, as I was saying, it's almost free to run this power tuning process. I've seen customers who actually run it at every deployment basically multiple times a day, and it's still less than $1. So why not?
Jeremy: Yeah, yeah. And I totally agree with you. I think the performance aspect of it is the biggest thing to optimize for. And when you see tweets or blog posts that criticize serverless performance, it's often because they don't have a knowledge of what it is that is possible to tweak. But again, again, going back to the idea of being able to measure that is an important component. So, let's get into more specifically some of these things you can do because these things that you do to these Lambda functions then running...because again, if you just run Lambda Power Tuning, and you see, okay, this cost, or I can get better performance if I turn the memory up. That is not the only way to get better performance.
That's not the only way to optimize your cost. There are so many other things that you could potentially do that would bring those things down either lower the execution time, or some of that. So, let's get into those. And I know you do a lot of presentations, all virtual now, unfortunately. I mean, again, I wish we could all get back doing presentations again, but I know that you like to break these things down into a couple different categories of optimization. So obviously, there's sort of our general optimizations that we can do, but then we have things that are very specific for cold starts. And then other things that would require you maybe to re-architect your application. So, maybe you can explain how you approach those categories?
Alex: Yeah. Sure. So, it's what you're concerned about is cold start, we talked about it at the very beginning. There are a few things you can do there. It's not likely to be the majority of your executions. But if it's customer facing you do want to optimize for a cold start. It's about monolith, avoiding monolithic function, you can optimize your dependencies, or rather minimize your dependencies. In some languages, you can minify or uglify your code. You can try to initialize some objects in a lazy fashion depending on what libraries you're using. You can optimize how you import the SDK components in our individual clients.
These are all things that allow you to shave off maybe 10, 50, even 100 milliseconds of cold start execution time. So, definitely worth having a look. Although I always recommend people, but don't stop there. That's probably 5 or 6% of your overall executions. You also want to optimize all the others. So, to optimize all the others, usually you either have to re-architect everything or rethink some components or some parts of your architecture. And that's great. If you can do that sometimes. Unfortunately, you cannot do that or you might decide it's not worth it. Many reasons why you may or may not want to do it.
Alex: Likely there are some low hanging fruits that you can actually take without re-architecting anything, basically, or refactoring your code, basically. We have already talked about one. One is memory optimization, resource allocation, zero code changes, zero architectural changes. We have talked about what you might expect depending on the pattern. There are a couple more. I think, if I remember correctly one is the keep alive option in the SDK. Often, you don't want to re-initiate a connection every time you want to talk to DynamoDB or Cognito or some other AWS service and you can just keep that connection alive just with a one configuration parameter or one environment variable. So, that's a pretty easy low hanging fruit you can see massive impact.
Not in the cold starts because if it's a cold start you will actually see the impact of creating the connection. But you will see a large impact in the remaining large percentage of your executions. There are a few more very specific too. Some runtimes are very specific to some use cases. But that's the way I like to think about it. The rest of the optimization strategies I'm aware of, unfortunately, kind of require you to rethink of some part of the architecture. We can talk about some of these if you want.
Jeremy: Yeah, so I mean, so one of the things I think that is sort of really interesting about optimizing once you get past the cold start thing is every time you have to make a network call, every time you have to do something that requires some sort of synchronous call, you are not only paying for that execution time, right? But you're also adding extra libraries into your code in order to make that happen. So, one of the cool optimizations, or I guess, maybe people might not think of it as optimization or as an optimization. I certainly do, though, are Lambda destinations. Because to me, it's like if you have an output of your function that needs to go somewhere having to put all that extra code and wait for the call to EventBridge or wait for the call to SQS or SNS, that if you don't have to do that. And you can use Lambda destinations to do that for you. I think that's a big optimization right there. I mean, maybe not a big optimization, but certainly interesting.
Alex: But there are many cases where your average execution time is slightly above 100 millisecond interval, 105 milliseconds, 110 milliseconds. And many developers ask me, "How do I shave off those five milliseconds? I can't touch my code further." And so, there are cases where, yes, you can just delegate to the Lambda service, the invocation of S&S or EventBridge or the destination that you want to invoke at the end of your execution. And not many people think of it as a cost optimization or a performance optimization. Overall, it's not like Lambda can do it faster than your code would do. So, it's not really a performance optimization. But because you're not paying for the execution time of that API call you might be able to shave off those five milliseconds. So, in some cases it might show some benefits for sure.
Jeremy: Right. Yeah. And so, another thing too that can really cut down time, especially when you are doing warm invocations is reusing any sort of global variable or connection or thing. If you mentioned the HTTP keep alive, that's great. But you're not really maintaining a connection, in the same way you would maintain a connection to say, an RDS cluster or something like that. So, the ability for you to... And you also mentioned lazy loading in there, which I think is another interesting thing, where you don't necessarily have to connect to the MySQL server when there's a cold start. And maybe that function doesn't need to connect to it until it actually needs the connection. But once you have the connection, having that global reuse of those variables I think is another way that again you don't have to keep reaching somewhere to rehydrate state.
Alex: Yeah, there are other cases too. Like if your runtime configuration parameters that you are fetching from Parameter Store or Secrets Manager. So, you don't really have to fetch those at every single invocation. You can just cache them locally. And as long as you are fairly sure that the value of those parameters did not change, unless it's a new deployment, or situations like that you don't really even need to check or to have an expiration time for that caching mechanisms. There are cases there where maybe it's a database password, and when you rotate it, the next query is going to fail. Because that value is not valid anymore. So, you may want to have some kind of retry mechanism to be able to detect an online invalid password error, and then just go and refetch the new password because it rotated if you're using Secrets Manager. And then just go on doing what you were trying to do. So, there are some more interesting cases there, too.
Jeremy: Yeah, and I mentioned RDS proxy too, that... Or I don't think I mentioned. I mentioned RDS. I didn't mention RDS proxy. That is actually another thing where you might not necessarily think of it as an optimization. But if you do not have to keep retrying connections, and you can get that connection pooling on the backend, so that you're minimizing the amount of stress on the database because you're using connection pooling. Those are all additional optimizations that could actually make query results come back faster.
Alex: Actually, RDS proxy is going to fetch the secrets from Secrets Manager for you. So that's...
Jeremy: Also that.
Alex: ...less code to write and also less execution time of Lambda itself to pay for. So, yeah, the RDS proxy is very powerful. Also, if you think about the resiliency if Node goes down you don't have to re-initiate another connection. It will just migrate a connection to another instance. So, it's pretty powerful.
Jeremy: All right. So, let's talk about a couple of best practices, right? So we talked about some ways to sort of tune or to optimize different things. But there are other sort of, I guess, best practices that can also optimize performance that can save on costs. Things that maybe aren't so much tweaking things. Just sort of general, I guess, concepts. And the first thing would be orchestration, right? If you're doing some sort of external orchestration, why do we use Step Functions as opposed to trying to write a Lambda function to do that for us?
Alex: Yeah, that's a that's a fair question. Actually, as a developer five years ago I would have told you I love to do orchestration in my code. It's simple, I don't have to pull in other services. I would probably do everything inside my Node.js application or Java application. But there are benefits to it, especially if you consider cost. As you were saying before, every time you are invoking an API or idling. In Lambda, idling means you're paying for nothing, right? You're paying for waiting. And one of the best things I love are Step Functions that we have mentioned slightly because Lambda Power Tuning is based on Step Functions.
One of the best features of the function is that you have... Actually, two best features. One is the wait state that allows you to wait up to I think a year without paying for idle. So, that's great if you have asynchronous stuff, or if you need to wait for human interaction and stuff like that. But also you have the ability to coordinate concurrent tasks that will converge into a final decision step maybe. And that's usually why you need to wait. Maybe you invoke three APIs, but one takes longer than the other. And you need some kind of coordination there. So, Step Function, you can do it. It's just a built in feature. You don't have to wait. You don't have to pay for idle in either of those three concurrent branches. So, pretty cool.
Jeremy: Right. Yeah. And waiting, the wait state thing is probably the best. And what I would love to see is, especially with longer running transactions, or longer running API calls. I know I have an API call that sometimes runs up to 25 seconds to do natural language processing. What would be great is if I could send my payload disconnect, and then wait for a webhook response when it was finished processing, and then avoid even more of that wait state in there, but that's maybe a different topic.
Another one, though, that I think is important, and this has to do with architecture and how people think about moving data around. And this is something I think Chris Munns said years and years and years ago. I don't know if it's from him. But he says, "You want to transform not transport data with Lambda function." So what does he mean by that?
Alex: So, it doesn't apply to every use case possible. I think there are cases where you need to fetch data from somewhere. But it could be at a relational database. It could be S3. It could be some service that has some nice filtering functionalities. So, usually, you want to fetch the least amount of data into Lambda because that means less by in the network, less idle time, less I/O time, basically. So you want to use Lambda to modify data, to manipulate data.
Ideally, you probably want to get the data directly in the event, instead of having to go and fetch it. So, if there is a way to do that. If there is a native trigger that will give you the input. Or if there is a way to get the data you need instead of go and fetch it, do it. But also, sometimes there is a better way to do what you're doing. For example, there are many situations where you need to fetch data from S3. But you don't really need to fetch the whole object. It's like if you are doing a select all from your RDS database, instead of using a where clause, right? You don't want to fetch the whole database, and then do the filtering in your application code, you want to do the database do the heavy work of fetching exactly what you need, so that you can minimize the network device over the network. And you can do the same in many situations, for example, with S3 Select. So, it's kind of like a database. You specify an SQL query, and you can fetch data out of a large S3 object without downloading the whole thing. So, if you are in a case like that where you're downloading a lot of data, it's very likely there is a way to only fetch the data you need and delegate that computation to another service.
Jeremy: All right. And you mentioned getting events, having sources or certain systems that will send events into your Lambda functions. And one of the things that you see a lot, especially with certain S3 events, and whatever is that some of those events are going to be uninteresting to your Lambda function. And every time your Lambda function responds to an event, you're going to pay for that processing. Even if you discard it. Even if you say, "Oh, no, I don't care about that event." You still pay for the invocation. You still paid for the hundred milliseconds at a minimum for it to just say, "I don't want this event." There are better ways to do that.
Alex: Yeah, absolutely. Usually, in the train and native trigger of Lambda you can probably add some kind of filtering. Whether it's S3 or SNS, or other kind of custom events in the AWS platform. If you can filter those out in the trigger configuration, it means you're not going to pay for that. And this is typically not a huge issue. But there are cases where, for example, you want to allow all your customers to upload files, but you only want to process images. Well, there are all things client side you can do to avoid them upload PDFs files, but sometimes they will do it anyway. So, you really don't want them to reach your Lambda functions and do a denial of wallet kind of thing to your architecture. So, you really want to discard those events as soon as possible, usually in the trigger config.
Jeremy: Right. And then another way that you can potentially save some money, or you can optimize, I guess, how often your functions run is by implementing things like throttling. Whether through an API gateway or maybe adjusting your concurrency, so how do you manage that? What are ways to kind of figure out what the right concurrency is or how much you should be throttling data to your application?
Alex: Yeah, I wish I had an answer for that, that we could discuss in a minute. It really depends on multiple things. It could be your business model. It could be your SLA. It could be a lot of things that doesn't allow your customers to invoke you 1000 times per minute, or per cycle, or per hour. You might have a freemium business model where free accounts can only invoke you once an hour, or once a minute. And so, those decisions are not really about, we want to make this thing as cheap as possible. It's more about you want to avoid misuse of the service or you want to avoid abuse as well of the service.
So, there are other things that you may not want to do for other downstream reasons. Like you may not want to delete more than 10 records from the database per second, stuff like that, just to avoid race conditions, or to avoid more problems somewhere else. Usually, it's not too much about saving money or those things like if you're scared of a DDoS attack you probably want to use WAF or AWS Shield or something that protects you at the edge, and not too much on the API layer or Lambda layer. But you can do that.
So, if there are good reasons to set a maximum concurrency for a single Lambda function, or for an API endpoint or a single route you can specify that at the API gateway level, or even at the individual Lambda function level. Always remind you though that you have a limited regional concurrency for all your functions. So, the sum of the concurrent executions is bound to that limit. So, there are the situations where you want to allocate a given concurrency to one Lambda function, so that the concurrency of the others is not going to affect the availability of that function. But again, it's not really much about cost. It's more about resiliency and availability.
Jeremy: Right. And I mean, if you're thinking about tenancy and multi tenancy for maybe your freemium, but then also you want to split up the tenancy for large paying clients and things like that, then putting them into different accounts and stuff like that you can control. That's another way that you could potentially optimize that. All right. So, another thing I think is super important is just as a really good best practices. Let's say you go through the power tuning exercise, you want to make sure that you take that information and you bake that into repeatable deployments. So obviously, infrastructure is code, huge best practice.
Alex: Absolutely. Yeah, I think nowadays you cannot do a lot quickly and reliably and securely with our infrastructure as code. I still meet a lot of developers that do not do it. If there is something you want to invest on in the next six months as a developer learn, and infrastructure is code framework. For serverless you have a lot of options. I meet more and more people that are in love with Terraform, or are in love with LPSN, or the serverless framework or the CDK. To me, it doesn't matter which one you choose as long as you choose one, you learn it, and you make it your default in your organization. I've met organizations that use multiple infrastructure as code tools. It's okay. I've used many in my career as well. They're all different and all equal in a way. Some are more vendor neutral, some are more community focused, some are more provided by the vendor. So, pick your religion here and see which one works better for you.
Jeremy: Right. All right. So, last one, and I think this is an important topic, is the idea of observability in your application. So, AWS' X-Ray, there's a ton of other observability tools. But why is having something like X-Ray such an important component in your serverless applications?
Alex: I think it can give you a hint into what the hell is going on when something goes wrong. That's the simple definition I can give you. There are many cases... I was actually talking to a customer a few weeks back there was using just going back to Lambda Power Tuning for a second. And they were seeing completely random results like, "Hey, this thing doesn't work. Every time I run it, it's different." So what I told them is, "Hey, turn on X-Ray, and see what's going on." So they had a legacy system downstream based on RDS single instance, single AZ, and they were testing like 2000 concurrent executions. So, that's never going to work. But somehow they didn't have that architectural diagram in their mind, and they didn't know what was going on.
So, that's a typical situation where a new person comes up or someone who is responsible for optimizing for cost, and they have no idea what are your downstream services or what might go wrong in the overall architecture. So, having visibility into that is the only way to fix the problem sometimes. And if we were talking in 2015, it was a hard problem in the serverless space. I think now you have a lot of options out there, a lot of even community heroes, and community leaders from AWS, and from other vendors as well. So, I wouldn't say it's a solved problem. But you can also pick your religion or your platform of choice.
Jeremy: Right. Yeah, well, and I think the most important thing, especially looking at instrumentation or looking at observability when you're using something like power tools, limited power tuning, and you're trying to figure out what is the most... How can you optimize it? If you keep saying, "Well, if I keep turning up the memory, keep turning up the memory, and it's not having an effect." It's important to be able to go and look and see, well, how long do these API calls take? You mentioned in the third party in the downstream stuff. So, if I'm calling some external API, there may be no way for me to shave time off of that. So, if you know that it doesn't matter if you have three gigs or you have 128 megs. It's still going to take 1.6 seconds or whatever it is for that API to respond to you on average, then there's really nothing you can do to optimize that. But that's important information to have just as a sort of holistic picture of how to do all this optimization.
Alex: Yeah. Well, if you are in a situation where each function is only doing one thing, maybe talking to only one downstream service, that's pretty easy. You might live without observability. The thing is, it's quite common that you are either reading from Dynamo, and then putting something into SNS or into Kinesis or whatever other service. If you're doing two or three things, and something's slowing down, and there is an error and what went wrong, maybe you're doing three things in series, and you can visualize it in the X-Ray trace visually. And you say, "Well, there is no reason why I shouldn't do it in parallel," and you can compress the execution time and do it much faster. So, it gives you visibility into what's going on and for different reasons, it might be very useful. For troubleshooting, for optimization, even just to have a nice picture to post on Twitter.
Jeremy: That's always a good reason to be able to share your X-Ray waterfall screens. All right, did we miss anything? I feel like we covered a lot of information here. I mean, we mentioned the stateless functions thing right from the beginning. Oh, actually, I don't think we mentioned stateless functions. We should talk about that for a second. That's another optimization. And you did mention not trying to hydrate stuff every single time. But again, serverless is sort of meant to be stateless, right? Why are stateless functions a good optimization?
Alex: Well, it's not like you can take a stateful function and magically convert it to stateless. It's still about where is the data coming from? Why am I depending on state during the execution? Or why am I not reading the state from somewhere else? So, if you have a stateful application that, for example, if you have three EC2 machines, and they rely on some sticky session mechanism, so you're talking to the same customer, and the session is stored on the EC2 machine instead of a Redis or a Dynamo, you might have that problem.
I think if you are developing with Lambda, it's less likely that you encounter such a situation where you're relying on sticky sessions or other stateful mechanisms. Usually, you don't really have a lot of storage or a lot of memory to store your state long term anyway. So there are still interesting things you can do at the design time. So, instead of using Redis, or Dynamo, or MongoDB to store state, you might say I will inject state into the execution because data is coming from external service that is invoking my function.
So, that's a way to make your functions completely stateless. And you only add the business logic. You don't even have to know where the data is coming from or where it's going to go next. So it's much easier. And I think here optimizing for cost is not only about execution costs. It's also about the cost of re-architecting an architecture. The cost of extending something in the next six to 12 months. Especially, if you're using orchestration with Step Functions it's much easier to inject state instead of fetching the state from inside the function. So, if you put all these things together, I think it will come natural to design a function that is stateless. Does that make sense?
Jeremy: Yeah, no, it does. Because I'm a huge fan of doing that where you're injecting the state along with your payload. I love JSON Web Tokens now if you're doing something at an API level because you've got that signed bit of information where even if it's just a user ID or something like that, that's passed in, it's verified. You know that, that token is valid. And you can use that ID as a way to save data or whatever. It's a much more optimized way of doing that if you don't have to make that separate I/O call. Awesome. All right. So, again, I'll ask again, did we miss anything? Again, just so much information here, but I think we covered pretty much all of it.
Alex: Yeah, I think we are good. New things might come up in the future. No spoilers, of course. And if we missed something maybe thing goes on Twitter or LinkedIn, I'm happy to learn from the community as well.
Jeremy: Right. So, if something comes up, and people want to get a hold of you, or they want to learn more about power tuning, Lambda Power Tuning and stuff like that, how do they do that?
Alex: So, you can find me on Twitter, Alex Casalboni, we'll add a few links, and also LinkedIn. Those are the two platforms I use the most. And while the project is on GitHub, we are going to add a link as well, I think. And I might actually go and write down a blog post with actual images about all of these, especially all the different patterns and the different visualization scenarios. So wait for it.
Jeremy: Awesome. All right. And then you've got your website, alexcasalboni.com. You write on DEV.to. You write on Medium. There's a test function/pattern thing. I'll include that in the show notes as well. Alex, thank you so much. Awesome information as always. Hopefully, I will get to see you in person again at some point. Probably not this year, but maybe next year once we have 2020 in our rear view. But thanks again, Alex, I really appreciate you being on.
Alex: Thank you very much, Jeremy. And thanks for all you're doing for the serverless community.