February 1, 2021 • 81 minutes
On this episode, Jeremy recaps serverless announcements from AWS re:Invent 2020 with help from AWS Heroes Yan Cui, Serhat Can, Luca Bianchi, Farrah Campbell, Ran Ribenzaft, Ben Ellerby, Sheen Brisals, and Gillian Armstrong.
Watch this episode on YouTube:
Gillian Armstrong is a Solutions Engineer at Liberty IT and an AWS Machine Learning Hero.
Luca Bianchi is the CTO of Neosperience, co-founder and co-organizer of many serverless community events in Italy, and an AWS Serverless Hero.
Sheen Brisals is the Senior Engineering Manager at The LEGO Group, a serverless speaker, and an AWS Serverless Hero.
Farrah Campbell is the Alliances & Ecosystem Director at Stackery, a co-organizer of ServerlessDays Virtual as well as several other serverless community events, and an AWS Serverless Hero.
Serhat Can is the Technical Evangelist at Atlassian and an AWS Community Hero
Ran Ribenzaft is the CTO at Epsagon and an AWS Serverless Hero
Watch this episode on YouTube: https://youtu.be/tENIWFp3uj8
Jeremy: Hi everyone, I'm Jeremy Daly, and this is Serverless Chats. Today we have an absolutely amazing episode for you. re:Invent 2020 is finally done. We're in February of 2021 and unless they decide to maybe stick another week of videos in, re:Invent 2020 is finally done. So, I figured why not do an episode where we can get the best input and the best insights from some of the most amazing people in serverless.
So, today, I have eight AWS heroes with me, and we're going to talk about all the amazing things that happened at re:Invent 2020. So, I'm just going to go quickly around the horn here and introduce everybody. So, first up, is AWS Serverless Hero, Independent Consultant, Developer Advocate at Lumigo, host of the "Real World Serverless Podcast," The Burning Monk himself, Mr. Yan Cui.
Yan: Hey guys.
Jeremy: All right, next we have an AWS Community Hero, he's a Technical Evangelist at Atlassian, and the guy who once helped me stalk Werner Vogels, just so we could get a photo with him, Mr. Serhat Can.
Serhat: Hey folks, happy to be here.
Jeremy: And, next, is another AWS Serverless Hero, he's the CTO of Neosperience, co-founder and co-organizer of just about every Serverless community event in Italy, the Italian Stallion of serverless, Mr. Luca Bianchi.
Jeremy: All right, next, we are joined by yet another AWS Serverless Hero, he's also the CTO at Epsagon, and way too smart for his own good, the man they call Mr. Obervability, Ran Ribenzaft.
Ran: Hey, everyone.
Jeremy: All right, moving on, we have another AWS Serverless hero, he's the VP of Engineering at Theodo, editor of "Serverless Transformation," and an excellent Nashville, Tennessee, drinking buddy, Ben Ellerby.
Ben: Hey Jeremy, thanks for having me.
Jeremy: All right, next up, another AWS Serverless Hero, she is the Alliances and Ecosystem Director at Stackery, a co-organizer of Serverless Days Virtual, and several other Serverless Community events, my good friend, the amazing, Farrah Campbell.
Farrah: Hi everybody, thanks for having me, Jeremy.
Jeremy: All right, next we have an AWS Serverless Hero, and Senior Engineering Manager at The Lego Group, he's a serverless speaker, an amazing writer, and all-around awesome guy, Mr. Sheen Brisals.
Sheen: Hey, everyone. Thank you, Jeremy.
Jeremy: And, finally, we have an AWS Machine Learning Hero, to round out the panel. She's a Solutions Engineer at Liberty IT, and co-conspirator in the Werner stalking incident with Serhat and me, the absolutely brilliant, Gillian Armstrong.
Gillian: Thanks, Jeremy, it's good to be here.
Jeremy: All right, so, we have eight amazing people right now, all kinds of knowledge that they can drop. So, what we're going to do is, we're going to go through each person. I'm going to just give you the floor, I want you to introduce something interesting from AWS. Whether AWS re:Invent, whether it was an announcement, or something that happened, just tell me about it. So, let's start with... let's start with, Yan. What's your favorite thing that happened at, re:Invent?
Yan: Yeah, sure, I think probably the biggest one, the biggest announcement at re:Invent for me was the Aurora Serverless V2. Which is really a bit of marvel when it comes to engineering excellence. Some of the things you can do with serverless, Aurora Serverless V2, in terms of the instance scaling and really fine grain steps of how quickly and how much to scale up. It solves a lot of problems that Aurora Serverless V1 had, that people actually want to use in production. And, things just takes forever to scale up and you go in the big double increase in size. Which in terms of cost, it's probably not the most, I guess, the pay as you go, because you're not doubling your costs straight away.
Yan: So, I'm quite excited about what they're going to do once it goes GA on the Aurora Serverless V2.
Jeremy: Yeah, and what do you think about the cost aspect of that? Because that was one of the things where you look at those ACUs and they've actually doubled the cost of those. But, do you think that just the benefit of having all those features and how quickly that's going to scale. Because it does, it scales instantaneously. Do you think that's going to be a good trade-off from a financial standpoint?
Yan: I think, I guess that we still have to see what happens once people start using it in production. But, because of the fact that they've reduced how, I guess the units that you scale up by. Because, before you have to scale from 4 to 8, to 16 to 32, and that means you end up spending a lot of money on ACU that you're not going to be using. Because it's just over that 8 ACU, and you just need 8.5, but yet that'd be 16. And, the fact that it takes longer for them to scale down, also means that you end up paying for overhead that you don't need for longer as well. So, besides all the extra features you get, the fact that it's more expensive per ACU doesn't necessarily mean that you end up paying more overall. Because you cut down a lot of the waste that you have.
So, I guess it's still wait, we still have to see in production, what actually transpires. But, given the fact that it now can scale up faster, which unlocks use cases that you couldn't do before. Before, it just takes too long to scale up. And, some extra features they talk about in terms of, was it? I've got some notes here, some nicer features. Like, right, so there's some stuff that was missing before, like a global database, IAM auth and the Lambda triggers, so all of those now are become available in the Aurora Serverless V2, where they were missing on the V1. So, between that and the fact that you, you cut out a lot of your waste, I suspect that it's going to make a lot more financial sense to go through V2 going forward.
Sheen: I'm curious by the way, why it's named V2, and not just, "Hey, we updated the Aurora and it's now better." Like, it's the first time I've seen that they nicknamed it V2. Is it something that happens behind? Or, like what's the reason?
Luca: Or, maybe Aurora Truly Serverless, or something like this.
Yan: I think that it's because it's a complete set of company offering. And, the fact that you can use V1 and V2 side-by-side. So, it's not a complete replacement of the Aurora Serverless, but it's a new, new imagining of how Aurora Serverless should work. Which is why it's V2. And, you can actually have the same cluster with both V1 and V2. I'm not sure how that's actually, how well that's going to work in production. In terms of capability, everything else, like okay, some stuff will work with the new features, like Lambda triggers and some won't. I'm not sure how that actually plays out in the real world, but at least you have that option of running the two versions side-by-side. Which, might make migration easier, so that you can gradually move stuff over, as opposed to you know, stop one day, the downtime, and then bring up a new V2 and then, see what happens.
Jeremy: Right, and thinking about some of those use cases, did anybody have any thoughts on maybe when people could say, "With this level of scalability now, what use cases could I maybe start migrating to serverless, that I couldn't before?"
Ran: I would think that the lowest scales would really be the Aurora Serverless, because honestly, you don't care about as much of the infrastructure, and you do care for every bit that you're paying. And, you do care about the instantaneous scalability. So, I think for the lower scales, it makes sense for them. On the other end, the more, bigger scales, billions or hundreds of billions of requests or rows, every day, it's something that probably, serverless, wouldn't be the best fit for. That's my opinion.
Yan: I think we have to factor into account how much engineering expertise you have to actually do that yourself. Because one thing that always gets left on the table is, "Well if I have to hire somebody to do this for me, to run on containers, that's going to cost me $20,000 a month, how much am I actually saving? If I save $10,000 a month on the AWS cost and spend double that on the staffing cost." So, again, if you've got expertise already, then yeah, I think you're absolutely right, but if you have to hire expertise from externally, then you have to factor into account the total cost ownership, not just, the cost of your AWS bill.
Ran: The other question is when should I choose RDS and when should I choose Aurora now?
Ran: Except for if I want an Oracle database or something which is traditionally old, but for MySQL or + works, why should I even bother with RDS anymore?
Jeremy: It's a good question, does anybody have an answer to that?
Yan: I mean, Aurora does give you some really amazing features, that you have to build yourself. Which are quite hard to configure and setup and all of that. Things like the global database, the fact that you've got Lambda triggers, you've got IM authentication, which I still have a problem with that, because you still need to have a root access. You still need to have a root user, so you still have that security overhead of having to maintain that. And, deal with that. So, but still, you've got a bunch of unique features that Aurora has over I guess a normal RDS? What do you call it? Just a non-Aurora RDS?
Yan: And Aurora has also got some pretty crazy performance, the last time I saw some benchmarks that you can ... Someone's able to get up to 10,000 ops per second, or something like that on Aurora, which is quite significant. It's not easy to get to that level of throughput on the RDS.
Serhat: I think if you look at it from a different angle. So, when Serverless Aurora V2 becomes popular, it will give you an extra option to DynamoDB, for those who are more to a SQL, and there are use cases now. They just stopped everything in Dynamo, because this the go-to option they have. With V2, that will kind of get loosened up. And, I mean, earlier this week an engineer came to me with that design. Where she wanted to keep a certain audit data in DynamoDB, but when I looked at the query that she had to perform on the data, I knew that it's too much for DynamoDB.
Serhat: So, these are our situations, not necessarily millions or billions of transactions and things. But, even that sort of flexibility and the ease of using, without compromising the performance. So, those aspects will come into play, eventually when it becomes swapped out.
Jeremy: Yeah, and I love using RDS, or even Aurora Serverless as one of the major ways that I've used it recently, was just to store data off of streams from DynamoDB as a way that I could query it. But, it was more for the admin side of things, and less for a sort of a front-end use case. But, yeah, I think that's a super interesting thing that you can do is say, "Look, if I can mix and match those two databases together, get the operational performance and that stability of Dynamo, but then also be able to expand my query capabilities with something like Postgres or MySQL," I think that's an interesting way to do it.
Ben: Especially, and I think with the glue elastic views, which came out at re:Invent, as well, we'll be able to have the same data replicated across multiple data stores. I've not played around with it yet, but I think we'll be able to have application data from DynamoDB replicated to relational database or other DynamoDB databases, but we'll be able to have data scientists interacting with those relational database directly. At the moment, I'm particularly using S3, as sort of Serverless Data Lake, and then using Athena to query on top of that. But, it's nowhere near as powerful as having a data scientist having direct access to a relational database.
Jeremy: Right, right.
Yan: There's also the fact that you can stream stuff from DynamoDB streams to Elasticsearch as well, which is such a common use case. I built this like three times in the last six months for different projects. And, it's just so much easier, if I can just point this elastic view to between DynamoDB and elastic search, that'd be awesome.
Jeremy: Yeah, the problem with Elasticsearch is that it's not Serverless yet, right? So, we're still managing those clusters. And, 99 percent of it's done for you, but you still have to sort of think about it. So, any other thoughts on Aurora Serverless V2?
Yan: I guess one thing that's worth mentioning is that, at least right now, before GA days, you don't have a data API yet. Which means, from a serverless point of you, you still have to worry about managing socket connections, pulling, all of that, and maybe bringing in a database proxy. But, I imagine, by the time they go GA, they should have added the support for the data API, because that was quite a big game-changer for, Aurora Serverless.
Jeremy: Right, definitely. All right, let's move on. Let's go to Luca, what was your favorite, re:Invent, announcement?
Luca: Yeah, my favorite announcement is the container image formal support for Lambda. Because, this is something that was very game-changing for machine learning practitioner, but also for a number of use cases. Because, now we have the possibility to drop and define all the dependencies of Lambda in a specific file format, or image, that can be deployed on the Lambda. It is not a direct support for container images. So, you cannot deploy containers, but you can describe in the same format everything that you want to be packaged inside with your Lambda, and this is great. It's great because it makes you able to go over a lot of constraints. It makes you able to implement tree-shaking or dependency removing. And, so you can optimize also the package that is deployed to Lambda. And, it's something that you could have done before, but you needed some kind of serverless-specific plugin, or to do this by yourself using CDK or whatever, and it was not straightforward. Right now, it's completely direct. And, you specify the format.
The other thing that is very important is that, choosing a standard format. We have a lot of Kubernetes of container dev ops engineer, that now can bring their workloads to Lambda using exactly the same format. Not every feature of the language of the format is supported. For example, you cannot open up a connection to all ports, to add services or manager connection with the image within the container, but the subset, which is supported, is something which is really great. Because you can run even runtime scripts on the packaging machine when you are wielding the package.
Moreover, the container support and everything that you can package to make your image working, and I'm referring about the Lambda runtime environment that can be packaged on the container makes you able to do two things. The first one is that you can choose your own image format, so you can start from any kind of available doc or image, you can choose Ubuntu, you can choose Redhat, you can choose Fedora, or whatever kind of image. And, you can package, you can start from say a python image. So, an image which has already a different kind of the python runtime, bundle with that or say some kind of Linux-based utilities, such as ImageMagick or FMPG, or whatsoever, so you can choose your base image, and you can bundle with that you runtime environment client. Which has been released open source, so you can also use that feature to test locally your development work cycle. Which, is great, because you can run a docker on your machine, deploy that image, make Lambda calls to that runtime environment. And then, test even before pushing back to the cloud.
Jeremy: Yeah, and I think that's a good distinction too, that it's a packaging format. You're not actually running a container in the Lambda function itself. But, yeah, I mean that opens up a whole bunch of tooling options. So, has anybody seen some feedback on how this helps people that have existing workflows start using serverless?
Luca: Yeah, I think so. And, I think that Gillian has some nice use cases because we have discussed about them before.
Gillian: Yeah, well, I'm mainly looking at that because we're doing machine learning. Obviously, the models are quite vague. Shoving them into traditional Lambda's not going to work out for you. So, having the extra space is really nice and it's definitely allowing us to do a bit of experimentation and see what we can do. The system I'm working on, it's got lots of different data extraction. So, we need different models for different times. We don't want them all running, like just burning cycles when they're not being used. Because some of them will be used very infrequently. So, being able to put those in a Lambda, it's a massive advantage. But then, I do have some concerns around the containerized Lambdas that you're going to see companies coming, and the first time anyone's using a Lambda, it's in a container and then they never come out of the container, instead of saying, "Well, if I can build this Lambda without the container, I'll start there." And, only move into a container when the use-case doesn't work for just a straightforward, simple Lambda, where you get a lot more just done for you. And, that can be right for some companies, but I'm wondering if some people might come from their container worlds, go into the container Lambdas, and never actually just go and build a plain Lambda, and that's actually all they need.
Yan: So, Gillian, I've got a question for you. Have you actually tried loading a large machine learning model in the container image of a Lambda function?
Luca: I tried to load an image model of more than 400 megabytes, and it's a good approach if you don't need to update the model. And, if you want to have some kind of immutable packaging. And, it works well, it requires a bit of time, when the engine is optimizing your container. Because, once you push, the workflows specifically that you push the image into an Amazon container, register it, and then you update the Lambda, or you create the Lambda refer pointing to that image in the container registry. And, when it happens, the Lambda starts pulling the image and optimizing the image, and that phase is dependent on the size of the package of the image.
Yan: So, my question for you guys, in that case, is have you measured the latency for reading that model?
Gillian: It's not fast.
Yan: I was talking to someone from IKEA.com, they had the exact use case, that Gillian, you're thinking about. They're trying to load a machine image, a container image, that's 1.5 gig, so they're trying to load a machine learning model that's 1.5 gig, and it took them 4 minutes to do that, in 100 megabyte chunks, which means that if you want to load 110 gigs, like full 10 gig container image, you won't have time. Lambda's going to time-out before you can load it. So, that's something that I'm trying to figure out, is that just something that they were running into, or is it just more of a platform problem that they haven't figured out yet.
And, the trade-off, they made the design for this container image is that, the container image itself, it doesn't matter how big it is, it gets broken up into small chunks, into sparse file system. So, that's how they're able to limit the cold start penalties for loading a container image. But then, that means, if it actually need to load lots of files, a large file from your container image, well good luck. It's going to be pulling small chunks from the whatever ... file power system.
Gillian: I want to talk to someone who gets 10 gig in their Lambda, I think you get a prize if you manage to get a 10 gig Lambda, and it's like successful and working well.
Jeremy: I'm also wondering, how does the increased cores, now that you can have 10 gigabytes of memory in there, does that help at all with loading these larger things? Or are you still just limited by the network?
Yan: Yeah, I do wonder that as well, because even before the 10 gig image, sorry, Lambda functions, you also had the full CPU when you're running the initialization as well. I don't know how well that applies to the container images as well. And, also, I guess one thing that's also worth mentioning with container image, is that now you are responsible for the security and the updates of, and patching of, the OS. Which is something that 95 percent of us don't want to do.
Jeremy: Right, and are we blurring the lines even more, I don't know, Farrah, with you, what's going on with Stackery? I know you're working with a lot of customers that are building using SAM, but also cloud formation and serverless framework, and some of these other things. Is this something you're seeing though, where people are super excited about it, because they think, "Hey, now I can just use containers on Lambda functions," or is this just getting too confusing?
Farrah: I think it's something that definitely excites people. I mean, I think what it does bring, is it provides the opportunity to be able to reuse images that you've already done that will validate, build, and deliver Lambda functions that previously, you would have to set up a whole tool chain for.
Farrah: I think it also helps, it's helping people, you don't have to jump into serverless head first. So, you can make these incremental approaches to starting to try to modernize your application. But, also, it's already fitting into, you see it really fitting into how people are already working. So, I think that we see Amazon really trying to figure out ways to integrate with tooling that's already there. With workflows and patterns that are already there. I mean, you see EventBridge has over 140 SaaS integrations now. The Lambda extensions did that API. I really just see ... while I think it's confusing, there's a lot of confusion, when you should use this or maybe where to use Fargate. How is this going to be filled? Does this support extensions? Does it support layers? So, I think there's still a lot of questions, but I do think it's really moving into, really trying to figure out, how do you help? Help developers with their current workflows. And, how do you help speed that along and make that a little more seamless.
Yan: It doesn't work with layers, I've checked. It doesn't work with layers, unfortunately.
Jeremy: It doesn't work with layers?
Yan: It doesn't.
Luca: Nope, not yet.
Jeremy: Well, I'm sure it will eventually, right?
Yan: Probably not, because the layers is a file system attaching to your file system that you already have.
Jeremy: That's a great, very good point.
Yan: But, if the file system is a container, then where are you going to attach it?
Jeremy: Right. Well, I think the other thing that's important to remember here too, is this is not ... Using a container as a packaging format is not an AWS innovation. IBM is already doing this, they're doing this as Azure. So, a lot of these other cloud providers have done this before, but it's certainly, as you said, Farrah, it certainly does help people sort of move in that direction. But, then, I also fear what Gillian said, that maybe people just get stuck in that. But, I think it's hard to fight gravity of the popularity of containers right now.
Yan: Yeah, and also I think that 90, 95 percent of people just don't need to use containers. It's like all of these new features they're adding. They keep adding EFS and extensions, all this other stuff. They're medicines, as opposed to just specific symptoms and problems. It's not something you should just go out every day.
Jeremy: Totally agree, totally agree. All right, so let's move on, Serhat, what was your, re:Invent announcement?
Serhat: So, one of my most favorite re:Invent announcements was one millisecond billing. And, I know a lot of people are really excited about this. And, I can't stress the importance of this change enough. This is really important, and when you look at it, AWS is probably going to lose a ton of money. Probably they lost a ton of money overnight. And, I know, from my friends, they save a lot of money overnight, and this shows how customer-focused AWS is. And, probably, it's not about just the money. Because, from our previous use cases, we had to run functions with more CPU and RAM, and then we are seeing like 100 milliseconds execution time, but we are paying for 100 milliseconds, now we don't have to. So, that means we can run our functions faster and cheaper.
So, that also means now people are thinking about moving to Lambda is going to cost them much more. And, they're going to lose some performance, now they can choose the highest memory if they want to. And, they're going to pay just the amount of execution time they spent. So, this enables a lot of more use cases. And, because now, there are more use cases you can run on Lambda, then in many cases you don't need another container management service, or EC2 or whatever to be able to run your whole services.
Because, there are definitely cases where you need to be fast, and then you start thinking about cold starts, cost issues, a lot of other things. And then, you start using containers, EKS, whatever, along with AWS Lambda, then your whole operation become a mess, right? So, that also means, it's not just about Lambdas getting cheaper, it's also about enabling more use cases.
Sheen: I'm curious, by the way, if there is any case that it doesn't become cheaper. Like, is there any mathematical behind this thing that I know it will get more expensive?
Jeremy: Maybe, only if you move more workloads on there. I think one of the things that I noticed with the one millisecond billing, is that it sounds really great in theory, and I am a huge fan of it. I think when we were out at AWS, maybe two years ago, I said, "Could you make it, maybe, a 50 millisecond?" Even that would be better than the 100 millisecond, and they went all the way down to one millisecond, which is great, but you do pay an invocation cost, right? So, even if your Lambda function is only running for 10 milliseconds, you are paying that invocation cost, so I wonder, if you're invoking more functions because now you don't have to worry about squeezing multiple operations into a single function, if that was how you're trying to do some sort of optimization, that if you're calling more functions that you are still, maybe, paying a little bit more because of those invocation costs.
Yan: I guess you could yeah, I guess if you're doing patching before, but now you're doing just one record at a time, you end up paying more for that 20 cents per million requests, as opposed to the whatever ... for millisecond billing.
Jeremy: I'm sure you could just turn your couch over and find some change that'll pay for that bill anyways, because it is so incredibly low. But yeah, no, I think for a lot of different things, they are, the 1 millisecond for me is, if you're doing some of those operations where, maybe you're polling, or you need to call an API for example. Being able to call an API, and having to pay an extra, maybe it takes 34 milliseconds to call the API, having to pay that extra whatever it is 66 milliseconds just seems like an excessive amount of time when you don't need to. And, if you do that millions and millions of times, it does start to add up.
Yan: I do think that, I guess in my worry for the millisecond billings, more that now there's more excuse for people to prematurely optimize because they want to cutdown 15 milliseconds of execution time. When, in fact, over the course of a month, they're paying 0.02 cents for the whole thing.
Yan: You're spending hundreds of dollars of engineering time on something you're never get your money back.
Yan: So, that's kind of more my concern about this millisecond billing. Before, there's no point, because whatever you do, you're going to pay 100 milliseconds anyway, but now that's their argument. "Yeah, we can save you some money."
Jeremy: So, Sheen, I know over at LEGO, your team has been a fan of using sort of the, not Lambda lifts, but sort of a fat Lambda, is how you referred to them in the past. Optimizing those, because they're doing multiple synchronous things together, have you seen a reduction in costs now that you're getting that one millisecond billing?
Sheen: Yes, I mean, things change. I was going to say actually, because I myself and many people said, "Oh, with millisecond billing, the Lambda functions are going to become single purpose, et cetera, et cetera." But, when you have a team of engineers doing Lambda functions all day, they don't really look at things that way, they just continue as usual. From the early days of having the sort of fat Lambda, or Lambda lift, I think that's changed. Now, it's more linear and single-purpose. But, even then, these days when you write a Lambda function, it's not just simply doing a few things and quitting. You have structure logging in place, you have bunch of conflict things getting loaded, bunch of things from parameters stored, and you have layers and this and that.
So, ultimately, there's so much on top of a simple Lambda function, so, that's the other side of this argument. Yes, it does benefit, but I don't think engineers are looking at that way in their day-to-day double-upping Lambda functions, in my opinion.
Jeremy: Right, and I think actually what Yan said too, about prematurely optimizing. I think most developers aren't thinking about costs still, even as we move to this serverless world, people are just kind of building their applications, and when it comes to speed, like maybe getting the latency down, and things like that, those are decisions they'll make. But saying, "Oh, well, we want to shave 8 cents off of our monthly Lambda bill." I think that's still outside the normal view of your average developer right now.
Sheen: Yeah, the counter to the optimizing, pretty much the optimization is like a ... Here's another thought now, because it costs less, why don't we up the RAM a bit, to get a bit more performance? So, that means, they will end up paying more, yeah? So, there's always these two sides to this.
Jeremy: Yeah, well, and I think the other thing too is that, like Alex Casalboni's optimizer tool, it just got a lot more complicated, because there's so many different options. But, all right, any other thoughts on the one millisecond billing?
Ben: Yeah, I'd just like to add to that actually, because our development teams are a bit weird, Jeremy. We do put a focus on reading our billing dates every week. More from an application understanding point of view, and to validate as things scale, the cost is still going to be in hand. So, all of our teams get this every sprint, as part of the scrum process. So, in the review they look at the AWS bill and make decisions based off that. And, for us, Lambda is never top of the list. It's optimizing things like API gateway, it's probably a better use of time. Although, Lambda, can be higher costs, depending on your use case.
Sheen: That's good to know, Ben, because I think I can take that back to our teams. I think that's very important, because often engineering teams, they never get to see the production billing or anything. That's something that's really useful, yeah.
Yan: Yeah, I think in practice, I see most people spent more money on the things like Gateway and CloudWatch. CloudWatch and x-ray stuff, well maybe not x-ray, but definitely CloudWatch, and anything the CloudWatch is usually pretty high up in the list of things that cost a lot of money.
Jeremy: All right, awesome, so, all right, let's go to, Gillian. What was your favorite announcement?
Gillian: Well, there's lots of cool announcements, but step functions are definitely my favorite serverless orchestrator. I use them for lots of different things. And, although, Lambda is the duct tape that you can use to fix any problem. You can stick it between anything and get anything stuck together, I do like to see things simplifying. So, seeing things like that synchronous express workflows. Seeing things like being able to automatically straight from an API gateway to a step function, or straight to API gateway from a step functions. And, I know you can get a nice circular thing going on there. So, being able to not having to put Lambda in between, obviously, you could've used a Lambda to call API gateway from a step function, but now if you can put it straight in, best code is no code. So, being able to just really simplify what you're putting together, simplify workflows, make your applications much, much easier, and much less code and much less phases, I think that's pretty cool.
Jeremy: Right, and as they always say, the most dangerous part of your application is the code that you write. Right? Like, everything else is sort of battle-tested, is there. And, that synchronous workflow ties in very nicely, I think, with the 1 millisecond billing. Because, now with the synchronous express, you can do that function composition, and you can actually have several Lambda functions that run back-to-back-to-back-to-back as part of a synchronous workflow, and now you're not paying 100 milliseconds every time. Or that, exorbitant transition fee that you do with normal step functions.
Yan: I think you're going to have a much bigger problem if you do that with cold starts though because the idea of using synchronous express workflows is attach them to the API gateway and stuff like that. If there's an API and it's user-facing, you definitely don't want to have like 5 LAMBA functions cold starting one after another. That's not going to be great for your user experience.
Jeremy: That's probably true. I'm wondering though if it's one of those things where it's like regular Lambda functions, once you get them warmed up, does it allow you to do certain things? But, also, even if you have, I guess, well, I guess if it's happening behind the scenes asynchronously, then asynchronous wouldn't be that big of a deal. But, if you potentially do need to have multiple things, multiple APIs called to bring back single response or something like that. I don't know, I think that if they can speed it up, they could be the solution to the function composition problem.
Yan: What I'd like to see is that they introduce some kind of a scripting language. Some kind of a retail, well maybe not retail, because that one is retail ... Some kind of a templating thing, where they can essentially just execute a script without it being a separate Lambda function. So, that would remove a lot, I guess, performance concerns that I would have. Like, I said, cold starts may not be that big an issue, but it does make your worst-case performance a lot worse when you've got them stacking up after one another. Because it's all on the same workflow.
Gillian: It'd be great if it just warmed everything up at the start. And know all those Lambda were in the step functions, just warmed them up right away.
Yan: I mean, you could provisional currency, but then that becomes the interesting cost-wise, it becomes another thing you've got to worry about.
Jeremy: Right, and expensive. And, actually, I think that's actually a good point though, just this idea, and maybe going back to, Sheen, what you were saying about the fat Lambda, and some of these other things. With the single-purpose function, I love single-purpose functions, I think it makes a ton of sense, but then on the other side of things, if you do have multiple steps that have to happen, having those all run at a single Lambda function, sometimes makes a lot of sense too. So, I think AWS is pushing people towards individual Lambda functions. And, look, having a Lambda function that does one simple thing is great, because you can compose those, they can be reused and things like that. But, without being really well-coordinated with step functions and understanding how all that stuff works.
And then, on the other side, like you said, Yan, paying that penalty of cold starts, if that's not a solution that hopefully gets better over time. So, I guess, maybe, a question for everybody, where are you? Are we still on the fat Lambda if need-be camp? Or, have we all moved towards the single-purpose?
Luca: I've seen two different approaches from people. On one side, you have people with the power refraining from serverless, preferring to embrace managed services and whatsoever, and they tend to embrace fat Lambda or Lambda leads as much as they can. On the other side, you have a lot of people that are enthusiastic from the possibility to package varied bunches, small pieces of code within a Lambda, and they would Lambda-ize everything. And, we are trying to have some difficulty in tying the balance between them, because my problem is not about having a huge Lambda or having a fat Lambda, but is ready to the fact that it pushes developer to adopt some kind of bad practices about ... okay, having just one crude service with everything inside, and you package everything within the Lambda, and maybe you just use 3 meg of that server, but it's super simple to package everything within your Lambda and who cares about that?
But, when you go into production and you measure cold start, you are hitting hard in the head by the cold starts. And, it's something about already too the behavior of the developer. And, I think I more shifted about having smaller Lambdas, because it encourages you to adopt some things like good patterns and good architectures, but it's not a dogma. Rather something that I like more.
Jeremy: Well, I'm curious to get your perspective, Ran, I mean you're the one who's looking at these functions being run. Is it easier to observe a single-purpose function, where only one thing's happening. Or trying to parse through those stack traces when you get errors on the Lambda list?
Ran: It's some and some for that. Is there use-cases for the simple exception that you're having, so having a monolithic Lambda is very easy to troubleshoot, because all-encompassing single location. You can see everything from the beginning all the way to the end. I mean, that's kind of single-node, but on the other side, when it's getting complex, when you're having pipelines, of 3, 4, 10 functions with different services, different third-parties and API calls, these problems tend to be more complex. And, you kind of want to see just an encapsulated problem. Like, the call to strike was earnest or something in the build was changed throughout the course of time spent. This thing that you can't see in a monolithic, because all the logic is internally. Everything happens internally in a single function. There is no outbound call that says, "Okay, I analyzed some data, I'm passing it to the other service that's responsible to charge that specific user. But, everything happens internally.
So, the upside to having more microservices approach, or more event-driven, and breaking the functions into smaller pieces is that if you're having the right tool, you will have greater visibility into what happens. Because, you know that at this point when it tried to charge the user, the input was X and something was missing from this specific input. Unlike a monolithic, by the way, at Epsagon we're having both cases. We're still stuck with some Flask data function that is having, kind of our fat Lambda, but except for that we got, I don't know, 600, 700 functions, different services, all try to be as small as possible. Someday, we will migrate our fat Lambda.
Yan: I think there's still limitations, things like I find with Kinesis. That's probably what Kinesis had done to these streams. That's the one place where I can't really quite follow single-responsibility functions as much as I'd like. Because of the fact that you have to contend with constraints on how many subscribers you can have. And, at the same time, there's no filtering, so you end up having to do a lot of filtering your own code, if you want to be handling just one type of event, when you've got an event process that funnels everything to you in one stream. So, I think, besides that, most other cases I found single-purpose functions have definitely been ideal. I've seen some clients that just go fat functions, I tell you, like, Ran, said, old errors you see happen in one function, you have no idea what happened to be going on.
Jeremy: Lot of console.logs.
Yan: Yeah, everything's in that one log. All the layers, one function. Everything got one message.
Sheen: I agree with what Ran was saying, because initially many teams, when they start, they want to put everything together, cost and so many things, but then, when it gets to production and running more, that's when the observability problem hits. They want to see what's going on, that's when the reality hits and they wish everything was grander, and it's latent, so it'll get more visibility. And, that is the case actually, because when you have production environment, you need to know what's going on. Especially now with all the different features now we have. We have all the destinations and keeping track of ERS and things like that, so, yeah that's an important thing.
Yan: And, it does an example from Lucas's example as well. We have got a client that had this one API that's doing one import, it's doing the service rendering, everything else just a simple get and put from DynamoDB corrects the stuff. Every single function, every single cold start is at 1.5 seconds, because of the React, because of that one endpoint that the services are rendering. So, that's where that one function is easy, but then you end up paying the cold start for every single endpoint.
Jeremy: Right, and I'm curious from your perspective too, Farrah, because I know you're working with a lot of companies that are doing this stuff. Is that something that you're seeing as well? Is it sort of a mix and match of the single-purpose versus the Lambda-lith/fat Lambda people? And, I don't want to fat-Lambda shame anybody, I certainly don't want to do that. So, but I mean, again, I use them sometimes too, but I'm just curious, what's your experience?
Farrah: Yeah, it think we definitely see a mix of both, but I think the goal for our companies, people want their environments to become more flexible. And, that's the whole goal of modernizing. And, if you have a big fat Lambda, is your architecture flexible? I might have to say, it probably is not. So, I think we really try to work towards, I'd say, more single-purpose functions. But, definitely, you see a combination of both.
Jeremy: Awesome, all right, so let's move on. Sheen, what was your exciting announcement from, re:Invent 2020?
Sheen: A few things, but one of my favorites with EventBridge, now we can archive and replay events. And, I'll tell you a reason why I like this. Because, EventBridge, we started to adopt EventBridge as soon as it came out. And then, at two or three occasions, in different use cases, when I spoke to teams to use EventBridge, there were a resistance. The simple reason being, what happens if I lose an event? What do I do? Especially when it comes to critical events that say, carries customer order data, or payments details, and things like that. So, in such scenarios, situations, you can't just go without any proof. So, I had to back off in those situations and use EventBridge in other scenarios. But, with the archive and replay mechanism, we get a sort of a confidence.
So, okay, you have your events here, if something gone wrong, or you are the consumer of a target Lambda, for example, have issues, it gives us the flexibility to replay those events as, and when, we need. So, that's an important thing which was missing until the announcement. Now, I had a brief look at the archive and replay setup. It's not completely clear to someone who is coming in. Because you may end up replaying, hitting all the targets, and you need to be careful, you are archiving for that particular pattern, or that rule that you have. Otherwise, it's not going to make much sense.
And, also, the other important thing many people miss is that when you replay the event, the event comes with an extra attribute, "Oh, I am a replay event," or something like that. So, that, again, is something important to look into when we build our patterns, different rules, and things like that. So, that's why this is one of my favorites, especially when it comes to EventBridge.
Jeremy: Yeah, I was of the very early adopter of EventBridge, and I remember the first thing I did was, you create a rule that captures every event, and you just send that somewhere so that you have that backup. And, so, adding this in, it probably seems like a relatively small thing, but it really does help. And then, like you said, the ability for you to have that little extra flag in there that says it's a replay event. That's super helpful, because if you're building in item potency, and some of the other things that you have to do when you maybe reprocess an event that already happened, that's a really good thing to have.
Now, Ben, I know you're a huge fan of EventBridge, as well, what are your thoughts on those new capabilities?
Ben: Sure, yeah, I mean, we're using EventBridge, on nearly all of our projects these days. And, actually, just a couple of days ago, an article went live on the AWS Game Tech blog, which talks about our use EventBridge in the E-Sports space. And that's, Gamercraft, is the company, so feel free to look at the use case. Archiving is something we've been doing ourselves for a while, so it's great to get that just done, out of the box. And, especially as a lot of our clients are in the regulate space. It's great to have things like, archive history out-of-the-box as well. As, we're starting to get things like, last year, the encryption at REST support, these can start to be used in more regulated industries. Replay's also great, and just before re:Invent, the instruction of retry policies, and dead letter queues, means we're getting a lot more robust than straight out-of-the-box.
We're already using, again, in a lot of our projects, and last thing I'm particularly using it for, but if you're using an event source space architecture, obviously archive and replay can be crucial parts for you.
Jeremy: Yeah, and also the thing that is getting more and more popular with just the way people are building their applications, is splitting up accounts. So, you have maybe a microservice, separate account for each microservice for example. And, you might have a separate account for each microservice, and then each stage of that microservice and things like that. So, cross-communication between accounts with EventBridge is kind of a complicated thing. I don't know if anybody watched, Steven Ledig's EvenBbridge talk during, re:Invent, but really, really, interesting and he just recently released some of those patterns on a GitHub repository, too. But, just thoughts on that, I mean people who've had experience with this, it's kind of clunky right now, but hopefully getting better.
Yan: It's gotten a lot better already. There's still some problems with it, like the fact that it still only delivers to the default bus on the destination account and stuff like that. But, at least you can now use resource policies to control which account can access, so you don't have to make a change on both the account where the bus is as well as the account where the destination is. Now, you can just do everything from the destination account, when you add a new subscriber, so that was quite nice change, and make things a lot easier for people doing that multi-account pattern.
I think, one thing I think, Sheen, you mentioned that you talk about the archive in replay, one thing that's missing is that when you replay, it just dumps as much events at you as quickly as they can. They don't respect the event ordering, so I was talking to the guys at Mahan, this big Swedish grocery shopping company. So, they build some tooling around EventBridge, and they actually build this EventBridge CLI2 they've got and they implemented with respecting the timestamps, so it gives them at the right time, as opposed to just, "Here you go, here's a million events, boom."
Gillian: We're looking at, EventBridge, definitely very interested in starting to use it more and more. So, I'll ask the people who are using it, so how do you feel about the observability? That does seem to be a little bit lacking in EventBridge. Logging, even with the new archive, I don't know if you can really use it to query, and find out what's happening, and if events haven't been picked up by any rules, you don't really know that they haven't.
Jeremy: If we only had someone here who knew about observability.
Jeremy: Oh, wait we do.
Yan: So support for EventBridge a while back, and I think Epsagon has support for it now, okay. So, with Lumigo, you can definitely just go to the explore page and then just query any data that, Lumigo, captures for you including stuff that traverses your EventBridge, and you can see your trace goes through EventBridge, Lambda, EventBridge, Lambda, and so on. Whereas, X-Ray, doesn't suppose that yet. So, I think Lumigo, Epsagon, and I want to say Thunder as well.
Serhat: Thunder does as well, yeah.
Yan: Yeah, cool, so yeah the oldest sort of, serverless, focus is the observability is they added support for, EventBridge, a while back.
Ben: And, if you go just back to the multi-account use case, we've been doing that on a lot of projects. So, we have one AWS account per service, and then one account per environment. Which is great from a sort of blast-radius security point of view. We also have clients who have legacy architectures, and actually, just last week, I found myself writing some .NET in a legacy system, which, I wouldn't advise you to do, but this was then sending events to EventBridge in the new system. And, that was a cross-account, EventBridge, which allows us to do a really nice sort of strangle-pan style migration to serverless. Through a progressive, sort of minimum viable migration-style, rather than a big flip-the-switch-style migration.
Sheen: Ben, questions about this cross-account event sharing. When I looked at a while ago, one thing I didn't quite like is, I lose the control transforming what I sent to the other account. It was kind of forcing me to send the original event to the other account, which is not going to work in every scenario. Where, the source of the event account needs to control what it provides to the other account. Is that still the case? Do you see issues around this?
Ben: Yeah, I think it's still the case, and in our use case, it's really two trusted systems. So, we didn't really have to think too much about limiting data that's coming from the events. From a legacy system point of view, sending data to other AWS accounts, maybe we could reduce sort of the cost of doing that, by reducing the data loads. But, yeah, those untrusted systems, we still have a sort of issues, around how we can try and filter the data at source, rather in the target AWS account.
Yan: Sheen, in that case, why not just do the transformation at the destination account between EventBridge and whatever eventual processing thing you've got? Because, I think the transmission's there, right?
Sheen: You mean between the two event busses or ... ?
Yan: No, not between the two event bus, but your event bus goes from the central account to the microservice account, and then you're going to process it, so can you not just do the transformation there, as opposed to between the busses?
Sheen: Yeah, I could do that, yeah, yeah. My point was, even the point from the source event when it goes out. That's where I would prefer to have the control.
Yan: Why is that?
Sheen: Say if you're sending payments data from a service that captures the payment data and tokens and things like that. I may have so many PII or data that you want to send to another account, that is dealing with order, or something else. So, there are certain scenarios where, it depends, again, even if it's within the same department, organization, it's fine. If, it's going to a different department, or different organization then, it can become an issue, exposing everything from the original event.
Jeremy: Okay, Ran, do you have any other thoughts on the observability of EventBridge?
Ran: You know, you're going to have this because if you're using this as an event hub to all of your services, you do have to need something in place, otherwise, it will go chaotic. Especially, if you're using EventBridge, it means that your system is definite microservices oriented. So, either make sure you're following message IDs and what happened to them or choose a solution off-the-shelf, otherwise it will get chaotic very fast.
Jeremy: Definitely, all right, so let's move on to your favorite announcement, Ran, and I know this wasn't necessarily ... I don't think this was announced at re:Invent, but it was pre-re:Invent, but I think it had a major impact on what you do.
Ran: Yep, so it's the Lambda log extensions. Obviously, I like almost all the announcements, but the one that I really cared about was the Lambda log extensions. That, you're right, pre:Invent, maybe a week or two before, re:Invent, this time. Basically, what they did, earlier this year is to provide the Lambda extensions with the runtime. So, you can provide your own runtime, and your own set of extensions on top of Lambda functions of, the runtime API, and do some more things while your Lambda is running, and before your Lambda is running, and after your Lambda is running.
Now, the third part of it, if I'm looking at the extensions API, runtime API, the third thing is the logs API. As we all know, in order to get a service from most of the solutions out there, that are doing either monitoring security observability or any other thing through a Lambda, they require some log analysis. They rely on ingesting logs, and comprehend, in order to generate meaningful insights. So far, right before, re:Invent, or actually up to September or October, there was just one destination for CloudWatch logs. Means that it was kind of competition between how do we solve, for a customer, having several solutions to listen for these logs.
Now, for, re:Invent, I do have kind of a question to everyone. For, re:Invent, they announced the logs API that allows me to choose a custom destination to ship or query or get or analyze these logs. Which is fantastic, it's amazing, but I think it was kind of one month later than needed, because around the end of September, the CloudWatch team, the part of the logs team announced two destinations.
Jeremy: Two destinations.
Ran: Like, exactly what we needed for so long time. So, it feels like it's kind of, I can still use the old or traditional way of streaming logs with the built-in integrations, by the way, that CloudWatch destinations got to Kinesis, or to S3, or a Lambda function, which makes good sense. Or, start to use the log's API. So, again, this is a great extension, this is a great capability, but it seems that somebody, somewhere else solved this problem for us. At least, for the meantime. Someday it will be, "Hey, we need three, we need four, we need five." So, the Lambda extensions for logs, is I want to say, unlimited, or at least not limited by a small number. But, that's my take on this one.
Yan: You are limited to five extensions per function.
Yan: So, there's other things to keep in mind by extensions as well, is that it runs at the same time as your function location. You don't have that background processing time after the customers, after your Lambda functions call is finished running. Which means, in practice, what people end up doing for the extension is they're doing a subversion model, whereby they're buffering things and then sending them in batches. Otherwise, you're going to have to add delay to every single function vocation at the end, and there's no trigger, there's no signal for you to know when the Lambda function call has finished.
So, you don't actually know when in your extension call you can actually save, to say, "Okay, the invocation is finished, I'm going to spend 10 milliseconds to send the logs to whatever destination." Which means, you have this weird batching into the next invocation, which means in cases whereby there's a gap of idle time between vocations on the worker, that means your logs it's not going to go anywhere. Until, either it does invocation, or the work itself gets garbage collected.
And this gets worse when it comes to provision concurrency, because guess what? That thing's going to be sitting there for a long time, before it gets garbage collected after 8 hours. Which means, if you are running provision concurrency, there's a chance you're not going to see your logs for a very log time. Unless, there's a regular invocation on those provisions concurrency. So, I agree.
Farrah: When you say a long time, I'm curious how long do you mean by that? What's the timeframe that people could expect for that?
Yan: Up to 8 hours. So, a Lambda worker has got a lease for up to 8 hours, which means, for provision concurrency, that gets kept around from the moment it's created. If there's one invocation at the start, okay, the logs are batched, I got buffered in the buffer in the extension. Nothing happens for 8 hours, so you're going to see the logs at the end of the lifetime of that worker, when it gets garbage collected. Because, that's the time when the extension gets a signal that Lambda function's terminated, it's shut down. So, you can now clean up and as part of the cleanup you can then send the logs to one of the third-party services they're using. Which means, you're just going to get a weird things of, "Okay, where's my logs? I don't see it for like 10 minutes." Because, there's no activity on those provisions with concurrency.
Ran: Yeah, and you're probably the one that read all 100 percent of the fine print of serverless. Anything that is written on AWS, you know exactly how many hours there are for logs to stream from a provision concurrent Lambda.
Yan: As part of the release, I had to read into a law of the extensions documentation, to figure out how it works and had some chat with Santiago, one of the guys that runs the team that work on that feature. So, I probably learned a bit more, too much about how that works than I should.
Jeremy: Hopefully, it's one of those things where most people don't need to know how that works. They just use Epsagon or Lumigo or Thunder or something and they just plugin it in. But, I have seen some people doing real-time log streaming with that. Especially, in a development environment. Which is really cool. I always looked at the Cloudwatch sort of the attaching listeners to your CloudWatch logs as sort of a lazy-logging type thing. Because, it's always delayed and it always take extra time, so I think if you need real-time logs, that extensions API certainly gets you much closer than you were before. But, yeah, I hadn't heard about that buffering problem of logs getting stuck in there, that's kind of interesting.
Ran: Yeah, I would that I know that you have that problem of continuously loading logs from CloudWatch. Like, you try to refresh, refresh, and it seems to get longer than expected. I do say, that when you're subscribing logs, it comes much earlier. If you're doing realtime processing, it will come faster than you'll see that on the CloudWatch console itself. So, that's one thing, but when you're having the log extension API, it's realtime, I would say in a matter of milliseconds or less. It's more for, I would say, realtime analysis, or gathering or batching of data. You might want to batch or gather some data, reduce, or down sample, or do something meaningful instead of all of the logs, just send the metrics out of these logs. Or alert when a five-window batch time seem to have some anomalous error.
And, again, it's a matter of, if you can wait probably one second to get your log that's probably an overkill, but if you're doing something that requires tons of data, or something that is more realtime, that's probably the case for it.
Yan: So, Jeremy, there's one problem that does make it really difficult to actually, what you're talking about, streaming, just because in your extension code, you're polling the log API. Sorry, you're registering for events from the Lambda logs API, but you don't have an event to tell you when the function invocation's finished. As in the actual module code has finished. So, you don't know when it's safe for you to terminate to stop your extension, because it's got a same, sort of similar to as a polling model to cut some extensions. So, it cuts the runtime, where you run your extensions when the invocation starts, and then you have to say when you're ready to yield, and to give up. So, if you're not careful you end up just running it for longer. The function finished here, but your extensions still running, so you end up causing actual delays to the Lambda invocation time itself.
Ben: One thing we actually did to get the logs in realtime for development environments, we did this earlier, actually last year. We created our own custom run center with node.js, this is before the extension support came out, and then we overwrote consoles.log, created an API gateway web software directly to the developer's computer, and then we could shoot console logs directly to the developers computer. Before the function finished executing, and in sort of almost realtime. So, if you do want to get direct feedback in your development environments, and you could do this, I suggest you do this in production. It was a bit of a hassle to set up, it's definitely not.
Yan: Serhat, did you guys, Thunder, didn't you guys have something a long time ago that let you do realtime debugging against a node runtime, essentially doing something similar that you are pushing, you're running a node debug on Lambda and then you're pushing events to someone's ID who's listening to that endpoint?
Serhat: I know, Sercan was doing some crazy stuff. Yeah, I don't know the details, I don't want to know the details.
Jeremy: All right, let's just hope that somebody figures it out and it works. All right, let's move on, so, Farrah, what was ... You got a favorite announcement?
Farrah: I watched a lot of customer stories for Stackery, I was writing about companies doing serverless, success. And, what really excites me, as if you're starting a product or company today, you literally have instant access to the compute power that enterprises are using. You have security performance that will get you to scaling that you need. But, I saw all these talks that are talking about dealing with hundred terabytes of data that they need to import, or imported or uploaded somehow. And, so I really just feel like watching all these, you really see the raw power of AWS. I know there's simply no way that people could have done these types of things prior to utilizing the cloud without spending, I don't know how much money, but it'd be astronomical. Those types of things are really, really exciting. At a time where I feel like, I felt pretty stagnant, a lot of us, we're not traveling, we're kind of stuck. To really see that innovation is still happening.
And actually, it's even moving faster, and in times of COVID, companies were able to scale and respond to their different needs. I saw a lot of stories from Liberty Mutual, Sheen, and your thoughts from, LEGO. But, there was tons of them, from like Volkswagen to AutoDesk, all of that was, I guess you'd say pretty powerful and incredible. This kind of didn't make me feel so stagnant.
Jeremy: Yeah, I think that the idea of the number of people that have been able to build companies in the cloud, using serverless technology, without having to spin up hundreds of thousands of dollars worth of equipment to do things. Especially, even just some of the machine learning stuff that is getting baked in. I was at a small startup before and we were barely doing anything in terms of ... we were a small customer of AWS, and I think our bill every month was like, $18,000. And, this was what? Seven or eight years ago. If we had serverless, at the time, our bill probably would have been $2000 a month. And, so what else could you have done?
And, it also means, what else can someone else do? What can the single developer do, or the small development team? Or someone who's just interested in maybe experimenting with something. I mean, it opens up a lot of doors.
Farrah: It definitely does, we're seeing that with startups and, in fact, soon I hope to have a couple case studies out about this. But, you really just see teams and the power that they have and the speed, and how that extends their runway, and their delivering on their roadmap a lot sooner. And just what that feels like. And, that, to me, it's really exciting, because I feel like we all need something to kind of hold onto right now to keep us moving and engaged in what we're doing. And so, those types of things really help me.
Sheen: I think that's an important point that Farrah mentioned. The startups, they don't just become successful simply because of cost-factor. I recently wrote a blog post, all the sort of flexibility and tooling that we get for free or whatever, for nothing. That allows us to move fast. That's an important thing. I watch quite a few of these real use-cases at re:Invent. It amazes me the different use cases, the way they use serverless. The one I liked was the Scottish Land Registry. There were tons and tons of records, real documents with S3 and serverless. Amazing, amazing stuff.
There was another thing which I never thought. I was watching DynamoDB-related, the talk from this famous entertainment company. Anyway, so, their approach is to keep it on-demand when they launch a product because they don't know the volume of traffic and the capacity. Then, once they study the traffic pattern, then they set to the provision capacity mode. Which I never really thought, because usually provision at-table and that's it, you're done. You never go back and change this, so amazing stories and really cool tips to take home.
Jeremy: Right, and I think that's a good point you make too, about setting the provisioned throughput for DynamoDB. There's actually several services where it doesn't always have to be on-demand. There's a few things where there's provision currency for Lambda, or obviously, provision throughput, where you can set certain sort of baseline, where you want to be. And, there's always a lot of flexibility in that, so if it goes above it, it's still going to scale, whatever. But, you can really optimize your costs as well. So, even in those situations where you might say, well, it's all on-demand, it's going to be really expensive, and I have a really flat sort of pricing model. There are ways to do that. Including savings plans for Lambda. So, even if you are using Lambda quite a bit and it's not as spiky a workload, you can still find savings in there. You've just got to do a little bit of digging, but it's possible.
Sheen: Yeah, it was Disney actually, that DynamoDB talked about.
Jeremy: Oh, yeah, Disney+, right yes. That was a good talk, that was a good talk.
Sheen: Yeah, it was, yeah.
Gillian: And, they made the cost anomaly detection service, it was GA during re:Invent, and I believe it's free. So, you should definitely turn it on, because it saved me right after re:Invent. Because I went in and started turning things on and trying things out.
Gillian: And left the server on.
Jeremy: How many people using a table, and they're wondering why. "Why am I paying this money for a provision table?" All right, so, I want to finish up with you, Ben. So, we already talked about EventBridge, which I know is a big thing for you. But was there anything else at, re:Invent, that really stuck out for you?
Ben: Sure, yeah, well EventBridge was the highlight for me, but just taking a second to think just sort of what my second topic would be. I'm actually working at the minute on a big, serverless, data lake project. Where we have data going into an S3 data lake, and then we're querying that with Athena, and visualizing it in quick sites for business intelligence. And, that's going really well, but what we need to do now is more realtime insights. And, we were previously doing this with some stuff going into Dynamo and querying off of that. But, what came out at re:Invent, and hasn't been probably the most talked about feature is the tumbling window support for Lambda.
Ben: So, previously, let's say we had data in a DynamoDB table, which was then, had a DynamoDB stream into Kinesis, we could do some processing on that data, but we couldn't really build aggregate statistics based off previous data. With this support, we can now have the stream of data coming into Kinesis, and for each sort of batch of data, we can have as one of the inputs the states of the output of the previous batch of computation.
Ben: So, we could, for instance, calculate the day's sales by having the output of the previous batch and keep adding on to that. We had a slightly different use case, and it's a little bit more complicated. But, this has really helped us have realtime data coming, visualized to the user.
Yan: So, Ben, could you not do that with, Kinesis analytics before? Or, is that not possible?
Ben: Yeah, so, Kinesis, I think is definitely sort of the other way to do it. And, Kinesis analytics actually had some new stuff out at, Reinvent, I think. When we tried to do it we found Kinesis analytics a little bit restricted. Because we weren't just doing the simple summation, we needed a bit more complex state coming from our previous execution. But, yeah, there might be a way to do it with Kinesis analytics. It was the tumbling window support was a little bit more flexible for our use case.
Yan: Okay, sure.
Jeremy: Right, yeah and there was a couple of other, I mean, besides tumbling windows, there were custom checkpoints that were added in. The SQS batch windows, again, just all different ways that you can have just a little bit more control over how you're processing your data. I think that was a big win, and I had talked to Ajay Nair right after re:Invent and kind of went through some of these different things, and, it seems to be the goal of AWS to basically take all these little objections, of well, I can't, or "I don't have enough control over this, or I don't have enough control over that," or whatever. And, just keep adding those in and adding those in. And then, I think that goes back to your point, Gillian, where it's like eventually, it's just going to be a lot of configuration, and maybe no code at all.
Yan: I think the SQS batch scares me a little though.
Sheen: Why is that?
Jeremy: Why is that?
Yan: Because the dealing with partial failures is already kind of tricky.
Jeremy: You don't want to deal with 10,000 partial failures?
Yan: Yeah, yeah. Okay, imagine you've got a batch of 10,000, two records failed, how do you then know which ones to delete yourself and which ones you don't? And, let it retry. If you don't then you have to do item potency, make sure it's done right. And, how do you track 9,998 records in-process previously and two didn't.
Jeremy: Right, right.
Sheen: That's interesting because someone asked me the same question. I mean, not 100,000, I'm sorry, 10,000, but even 250 messages batch, they pose the same concern. You know what I did? I pointed that person to your recommendation, you talk about, Yan, to manually kind of do the dealing, so that when it fails you won't get everything sort of reprocessed. I mean, that's why I like the one that ... What is it called? Custom ...
Jeremy: Custom checkpoints.
Sheen: Yeah, checkpoint, yeah, that helps. So that you don't kind of get into this mess, just ...
Jeremy: Yeah, I mean, because the bisecting of batches was great, but then you would still end up reprocessing a bunch of things. Now, you can just say if it fails at this point, I'm going to start again at that point. So, that's a pretty good thing. So, all right, anyone else have any other thoughts or big things that happened at re:Invent, something you want to share? The floor is yours? Luca.
Luca: Yeah, I think that we had great announcement shifting machine learning more towards dev ops, because, AWS announced that Sagemaker Pipeline, and so, which is great to manage machine learning model are shifting from development to production. And, it's something that was missed and it's something that is filling the gap between data scientist and the developer. I was having a very nice discussion with a friend a couple of weeks ago, and he told me, this means that data scientists are not anymore wizard with magical books full of spells, but they are becoming engineers and they are bringing things into production. And, situational pipelines goes in that direction.
Jeremy: Yeah, no, I think machine learning ... There were a lot of announcements with machine learning. I wish we had more time, we've already been talking for a while. I try to stay away from machine learning, but it's maybe just because I watched The Terminator too many times when I was a kid, and I'm just very nervous of Skynet becoming self-aware. Anyway, let's leave it there, everyone, thank you so much.
Sheen, Serhat, Gillian, Yan, Ran, Farrah, Ben, and Luca, this has been absolutely amazing. Nine people may have been too much. I don't know, we might have to cut it down next time, but I appreciate all your insights. I'm going to put in the show notes links to Twitter and information on all of these amazing people that were on the show today. Thank you for watching, thank you all for participating, and we will see you next time.
Serhat: Thanks, everyone.
Gillian: Thank you.
Luca: Thank you very much.