October 7, 2019 • 53 minutes
Jeremy chats with Brian LeRoux about why he and his team built the Architect Framework, how it makes building modern serverless apps easier, and why DynamoDB should be your cloud database of choice.
Brian LeRoux is currently building a continuous delivery vehicle for cloud functions called begin.com on an open source foundation called arc.codes. Previously he worked at Adobe on PhoneGap and Apache Cordova. Brian believes the future will be writ as functions, seamlessly running in the cloud, agnostic of vendors, on an open source platform and it will be stewarded by hackers like you.
Jeremy: Hi, everyone. I'm Jeremy Daly, and you're listening to Serverless Chats. This week, I'm chatting with Brian LeRoux. Hey, Brian. Thanks for joining me.
Brian: Thanks for having me. Excited to be here.
Jeremy: So you are the Co-Founder and CTO at Begin. So why don’t you tell the listeners a little bit about yourself and what Begin does.
Brian: Yeah. Cool. So, I'm Brian. I’m a Webby hacker. I guess you could say I've been building software for a really, really long time now, and my focus has been, in the last few years, the cloud. In a previous life, I used to work in the mobile space quite a bit and Begin.com is continuous integration and delivery for serverless applications — modern serverless applications.
Jeremy: Awesome. Alright, so I wanted to have you on today to talk about the Architect Framework. So this has been out there for quite some time, but just in case people aren't familiar with what it is, maybe you could just tell listeners what the Architect Framework is all about and what you can build with it.
Brian: Yeah, Architect is a serverless framework. It papers over some of the more complex bits of getting up and running with a serverless application. It's sometimes accused of being pretty opinionated, but I think maybe we'll dig into that a bit in this episode and how maybe it's not so much opinionated, just, you know, makes some choices up front for you and saves you time. It's much more convention than it is configuration. And it's really targeted at building super fast web apps.
Jeremy: So let's get into that. So first of all, why did you build this? Because there's Claudia.js and there’s Sam, and there’s Serverless Framework, and there’s — you name it, there's a framework out there that helps you build serverless applications. So what was the reasoning behind it?
Jeremy: And I think that's actually kind of typical of building serverless applications. I know as I started building serverless, as I was working on my first few serverless applications, I built the Lambda API web framework, right, because I just needed a better way to process API Gateway calls using the Lambda proxy integrations. And then,I built the Serverless-MySQL package to deal with the max connections issue. And you start building these components to help you build the products that you want to build, and eventually, you get some really cool tools that come out of it. And that's basically what you did with Architect, and you’ve made that open source, right?
Brian: Yeah. We donated it to the, at that time it was called the JS Foundation. But the JS Foundation, and the Node.js Foundation merged and became the OpenJS Foundation. I've got a pretty long history in open source, and I really believe in foundation-backed governance for projects. And so it was important to me to pull that IP out of a privately-held, venture-backed startup and put it into a place where the commons could contribute back to it. And just as a note, so the listeners understand, I'm not a zero sum thinker. There's going to be more than one framework. Technology tends to be additive, and there's going to be different things that we could learn from each other and build on from each other. So by all means, check out Architect, but, you know, if you're a Python hacker, you would be remiss not to check out Chalice. And if you're deep into AWS, you're probably already using Sam, and I think that's just fine. That makes sense to me.
Jeremy: Awesome. Alright, so let's talk about this opinionated thing because when I first saw this — it is kind of funny — but when I first saw the Architect Framework, I was looking through it, and I'm like, okay, this seems like it is built for solving a very specific problem. And again, I know it can be extended and you've got other things we can talk about, like some of your macros and things that you can build on top of it, but it just seemed very opinionated to me, in the sense that you were enforcing small file sizes, single purpose Lambda functions, that kind of stuff. So what are your thoughts on that? Because you maybe don't think it's so opinionated, right?
Brian: I didn't and Yan Cui from the Burning Monk wrote an awesome blog post where he threw Architect way up in the opinionated corner. And when we first saw it were like, “Oh, weird. Okay.” So we do look different and we do look like we have opinions, but I think most people will share those opinions. So one of our opinions is that we need to be really fast. And we need to be faster both author time and at runtime. So by author time speed, I mean, we need deploy iterations and lead time to production to be really quick. Monolithic apps have pretty poor characteristics for this. They tend to be deployed in minutes to hours, if not days or weeks, whereas serverless applications, because we break them apart into small constituent pieces, or we can, we can deploy those artifacts in parallel. We get a lot faster deployment speeds as a result. So that's really nice. And I like that, and I like small functions for that particular aspect. There's a single responsibility principle as well. When you have bigger functions, they're going to be harder to debug by default. If you have small single responsibility functions, your discovery and maintenance gets a lot easier. If there's a, for example, if there's a bug on the get about page, and there's one Lambda function serving the get about page, well, you know exactly where the problem is. It's on the get about page function. It's not somewhere in this ball of code that you have uploaded. The kind of final piece to this with the small functions bit was really driven by practicality. So in our first version of our bot for Begin, we did the same thing everyone else did. We put an Express web server inside of a Lambda function, and it worked — it worked really well until we started building something bigger than Hello World, and then it started to not work so well. And in particular, the thing that didn't work well was the cold start. And you hear about cold starts all the time. And it sort of irks me because I feel like this problem has been solved for a long time, too. We measured cold start with every different runtime with varying payload sizes, and we determined that it was correlated to payload size. So we did the thing where we just made small functions. So we found that in the earlier versions of Lambda, there was a 5MB kind of magic number. If you were over 5MB, you would be over a second cold start. If you were under 5MB, you would usually be sub-second cold start, which was totally a suitable performance profile for a bot. So we just set up our CI to fail build after functions got bigger than 5MB, and we started dividing up our app in the single responsibility principle.
Jeremy: And by payload size, you mean the package size of the artifact, right?
Brian: Yeah, the zipped package size, actually. Yeah, and I've been talking to other devs about this, and there's been a lot of movement in the last four years on Lambda, and it's gotten a lot better at cold starts. And I imagine these numbers are different today, but just because you can, doesn't mean you should. I think it's totally appropriate to build out your first versions with just a few fat functions. But as time goes on, you're gonna want that single responsibility principle and the isolation that it brings. There's one last small interesting advantage to this technique is that the security posture is just better. You have less blast radius. If your functions are locked down to their least privilege and their single responsibility, you're just going to have a way better risk profile for security.
Jeremy: Right. So the other thing again, maybe this is why Yan and maybe some others, including myself, thought it was an opinionated framework, and that’s because it's — I don't think limited is the right word, but it's specifically curated, I guess, with just a few core components that you can use to build applications. But with that small set of services, you can quite accurately replicate the execution environment on your local machine, right?
Brian: Yeah. This was another really important thing for us, and I think it was, frankly, probably more of a coping mechanism than anything else. When you open up that AWS console for the first time, it's a pretty intimidating experience. There's over 300 services there, you know, and you don't know where to look. You don't know how they integrated with each other and they've got different UIs for each service. And we sat back and really looked at the requirements of our application and distilled it down to its constituent parts, what protocols we needed to support and how. And we realized we only needed eight services. We didn't need 300. We just needed a subset of them to build a CRUD-y web app. So with that knowledge, we really built our abstractions on top of those eight services. We don't hide the other services from you. We just paved the path for those eight and make it really smooth and easy to get on board with them. The canonical example of difficulty for configuration would be probably API gateway. And most people would agree it's a bit of a beast and that it’s a powerful beast, but it's a scary, powerful beast. And so most people just want to give it some URLs and say, “please return values from a function when these URLs get invoked.” They don't want to get into the depths of velocity templates and the rest of it. So we we paper over API Gateway. We make that part look really, really simple, even though it's really complicated under the hood, and then we add some sugar — some terseness I should say — to the configuration file format for Dynamo, SQS, SNS, and a handful of other services that really aren't that user facing. And that's kind of it. We have a macro primitive as well that lets you reach into the cloud formation that gets generated. So you can access anything in AWS that you want. It's just our contention that, most of the time, for a large portion of applications you won't even need to.
Jeremy: And so that's another thing that I think is interesting about the Architect Framework, is the fact that you have those primitives for, like, SQS and SNS. And I know they're named after AWS services, but if you think of like the Serverless Framework, for example, they sort of abstract away the connection to things like SQS queues. Like it'll create certain resources for you, but it's generally in the context of a connection to a function. But if you just wanted to create your own SQS queue, you'd have to write the CloudFormation and put that in the resources section of your serverless.yml file in order to build that SQS Queue, and the same with SNS and DynamoDB is a great example. So by having these primitives — and we can talk more about that in a minute — but is that something that you're looking ahead for, something like cloud portability?
Brian: I think that there is a possibility of portability and a longer run future. I’m less interested in a disintermediation, and I’m more interested in velocity on the de facto cloud. And this isn't sucking up to Amazon, this is just being straight up with where the state of the industry is. If I had better options, I would take them for sure. But AWS has a really big lead in this world, and the other players are, frankly, just catching up. Azure has a concept called Azure Resource Manager, which is — ARM is the acronym — that is their equivalent to CloudFormation. It's very new. Can't really build a full app with it yet. Google doesn't even have an answer to this idea of infra as code yet. So I guess Terraform would be — Terraform and Kubernetes, YAML files would be the answer. So portable where? Would be my question. And I just don't think…I think they'll catch up. I absolutely believe that there's going to be more than one cloud. I just don't know who it is yet. So what I do know, though, is that AWS is amazing. It gives me the characteristics that I want, and they've been a wonderful partner to work with too, so we can build on there with a high degree of confidence that they are probably the de facto standard. And we'll see where the other players get to. But like ARM looks a whole lot like CloudFormation to me, so can we translate CloudFormaton into ARM one day? I don't know. Maybe. Do I want to do that? Not super badly. To be honest with you, it all comes down to the data store and DynamoDB is real hard to beat. I'm sure Cosmos is going to try, but I am an extremely happy locked-in Dynamo user right now, and I don't see why I would adopt more latency to use it. So that's kind of my perspective there. But, you know, is it possible? Sure. Do you want to? Probably not. Not right now.
Jeremy: Yeah, at least not right now. Definitely. Well, I want to talk about DynamoDB. I have a whole bunch of questions around DynamoDB that I want to ask you, but maybe let's go back to the Framework and talk about, you know, how does it help you build sort of these modern serverless applications?
Brian: Yeah. So it gets you off the ground running really fast, and, you know, everyone makes that claim that I'd like to quantify it. So within 10 seconds, we should have a local development environment that fully replicates exactly what you would have in the cloud. Folks would say, you know, “This is impossible.” I heard the same thing about mobile emulators 10 years ago. It's not impossible. It's very possible. AWS doesn't change your APIs all the time. In fact, they change them extremely rarely and again, we're subsetting so like a very small number of services. So we have a pretty kick-ass local development environment. It's a couple years old now, and it's been matured in the open source world, and it works real fast, so you don't need to deploy or even to have AWS creds to get started. And that's a really big advantage for understanding how this thing fits together. We even have DynamoDB running locally using Michael Hart’s amazing DynaLite project. And then the next step beyond that is like, okay, cool. I want to get these bits up in the cloud. And once you've credentialed yourself with Amazon correctly, it's one command and you're there. There's no configuration to speak of, other than the arc file, and everything else gets generated by convention. And this leads to a pretty slick development experience, you know, kind of what we're used to. CRUD apps back in the day where you could generate routes, and see them deployed and see them respond to HTTP events and, you know, have states shared between them but in a stateless execution environment. And that's really the class of app I'm building on AWS today. There's obviously a whole lot of other workloads that are possible, but we're really tuned for that use case, in building a web app as fast as you can.
Jeremy: Right and that's another thing that I really like about Architect, is that you're not trying to be all things to all people, right? And it’s great, that bootstrapping locally or just getting you up and running right away without even connecting to the cloud, I think is sort of an interesting approach to doing that. But in order to deploy, you end up just generating SAM templates under the hood though, right?
Brian: Yeah, but we didn't initially. Actually, this is a fairly recent addition. We did the Sam, Architect 6 is all SAM and CloudFormation-based. We were able to delete roughly 20,000 lines of code, which I'm going to get into in my Serverlessconf talk. So just going to CloudFormation, for the listeners, by the way, like screw whether or not you use Architect. That's cool if you do. But even if you don't, learn from us and and the learning is, we did an SDK-based framework for the first few years of its life and the net result of us moving to CloudFormation was a massive code deletion. And we gained features in the process. We gained a ton of features in the process. CloudFormation is absolutely where the puck is going to be. And to me, it's the de facto standard. I'm certain a lot of people would cringe at that one, because it's not, you know, consortium-based spec standard, but it's the right way to build an AWS application. It's less code. It offers a huge amount of determinism, and yeah, we've been really happy moving to it as the baseline. And so ARC really is a file format and that file format is extremely terse. It's like if YAML chilled out and just didn't and so that the file format, a lot of people get a little bit bent out of shape about, but we had good reasons for doing this. So one, we wanted comments and JSON doesn't have comments. Two, we didn't want deeply nested structures, and YAML really, really encourages deeply nested structures. And so we didn't want to use JSON. We didn't want to use YAML, and so we were using ini like files for a little bit. And then we kind of end up creating our own syntax along the way. And it's really terse. It's extremely readable. You can also write it, and this is a big benefit to Architect. You can look at the manifest file, and within a few lines of code, you will understand what that application does. Nobody could look at a SAM or a CloudFormation document and know what that application does. It doesn't tell you anything. It just shows you a lot of stuff. So we translate that ARC file into a SAM document for you, and we dump it in the root of your directory so you can see the delta for yourself. But usually it's a 60 to 80x reduction in configuration, which is a very huge productivity improvement that is quantifiable. You know, you can see it for yourself, just run it and boom. You just generated a shitload of CloudFormation.
Jeremy: I have spent days working on CloudFormation files, so yes, I know full well that it would be nice. So you had mentioned being able to access those files as well. So the ARC file is your configuration. Listeners have to go check this out, because, like you said, it’s ridiculously terse. I mean, it's so short what you need to do in order to generate probably, what? 500, a thousand lines of CloudFormation at the end of the day or something like that?
Brian: Yeah, and so we've really distilled a lot of the what we feel are the best practices. Maybe, we've distilled a lot of the opinions that are out there on how to do this. So because we know all the resources up front, we can give you a least privilege role and attach it to those.
Jeremy: Oh right. Yeah, Yeah, that’s awesome.
Brian: So you don't have to do any of that. The other big, tricky thing in CloudFormation is getting service discovery, right? So, ideally, when you're building out your serverless application in CloudFormation, none of your resources have a human-readable name. You can give them logical IDs, which are human-readable names. But you want that generated stuff to effectively be GUIDs, and you don't want that to have any significance or meaning because we want to treat our services like cattle. We don't want to treat them like pets. So want to be able to wipe those out and recreate them and what have you. And so at runtime, this becomes a pain in the ass. If your database table is a GUID, it's really hard to find. So Architect generates a service discovery scheme for you by using SSM parameters, which are free tier. There are other ways to do this. Some people like to use environment variables, but those get out of hand with structured data, and other people like using Cloud Map, which I think has probably got a good future. But Cloud Map is also extremely expensive. It's 10 cents per resource per month, and you could rack up a lot of resources. We have thousands for Begin, so it just wasn't realistic for us. So we use SSM, which is a free key value story and has great throughput. And we can cache the results of those lookups and so you can interact with your DynamoDB table at runtime as though it had a real name. But under the hood, it does not. So those are just some of the things that we can do with that CloudFormation for you without you having to think about it. There's a ton more of minutia, especially around IAM roles and the security aspects.
Jeremy: So then if you were to build out something, you generate all this CloudFormation or the SAM template, you said you can add custom — well, there's two things you could do. One, you can add custom resources through your macros. But the other thing you can do is if you just say, you know, I don't want to use Architect anymore, I just want to take my bootstrapped SAM file, I can just eject and go my separate way.
Brian: You can bail and we actually have a playground on the website. If you go to arc.code/playground, we've got one of those two-up things on the left side. You write Architect and on the right side, we show you the generated CloudFormation template. And yeah, there was an eject path in this that we really wanted to have. I feel that increasing the CloudFormation is the standard way to build AWS applications. If the cloud had a file format, it would probably be CloudFormation. So in order for us to have interoperability with that ecosystem and also the, you know, the portability into things like SAR, we really wanted have that ability to eject and not hide it behind a leaky abstraction.
Jeremy: And so speaking of this idea of putting things into SAM, when you were using SDKs, all of the services at AWS, you can control with their APIs, right? That's like the first thing they release. CloudFormation, not so much, so are you handicapped at all by using SAM now, in some cases, or just have you not run into those limitations yet?
Brian: So one of the beauties of the ARC macro system is that it can run an after deploy. And we haven't exposed all of this yet in documentation but it's in the code and we do it ourselves. And so you can run a patch after you deploy, or you can do whatever the heck you want with those generated resources. It's kind of like CloudFormation custom resources, except for it runs locally on your machine and uses SDK calls. This seems impure and dangerous — and it is. But we had to do it. So believe it or not, AWS has bugs sometimes. And when we did this move to CloudFormation, we found some of those bugs and we were not stoked on them. We found bugs in particular with API Gateway that were pretty deal-breaking around binary content encoding. So we ended up having to write a patch that ran after deploy and did another AWS or did another API Gateway deploy, which is a bit of — it felt dirty. But it worked.
Jeremy: It does. It does. I'm right there with you on EventBridge, because that's the other thing. My latest project is using EventBridge, and you can’t create custom buses. And you can’t create, or you can’t add rules to custom buses through CloudFormation...
Brian: See, this makes me angry now. At this point, this makes me angry as a customer. I really feel Amazon is dropping the ball on this one. So CloudFormation is not a nice-to-have. The service, the team’s got to get together and have this on day zero for every one of their releases. If they expect us to be following the best practices, they publish.
Jeremy: That would be nice.
Brian: Yeah, and I feel like it's not really a disadvantage because we can drop down to the SDK at any time. And there are actually times when you maybe do want to do that. So Architect has another semi, not well-known feature called Dirty Deploys. So we'll deploy using CloudFormation by default. And by default, we deploy to a staging stack, and we have a production stack, which is an extra step to get to. You could do your own arbitrary stacks if you want, but we bake in staging/production because we feel that's essential complexity. The other thing we can do is just deploy functions using update function code calls, and this is, the syntax for invoking it is “arc deploy dirty” and it is dirty. But what we'll do is we'll literally zip all your functions and will replace the ones in staging and a Dirty Deploy usually runs for, say, 10 functions within two seconds. So your iteration speed is incredibly good on staging. When you do these Dirty Deploys better than CloudFormation even, although CloudFormation’s gotten pretty fast. It's not that slow anymore. It used to be really slow. Yeah, so sometimes you need to drop into that SDK, get a little bit dirty, and I think that's okay. But if Amazon's listening, they really got to get CloudFormation support day zero for every new product. That's table stakes these days.
Jeremy: Alright, so let's talk about microservices It sounds like we're building a single application, right? And I know that what I do with the Serverless Framework or with SAM, is I will build multiple services with separate CloudFormation stacks as microservices and so forth. Is that something we can do with Architect as well?
Brian: Yeah, it's not super well documented, but we have a way for broadcasting SNS events between stacks. The service discovery allows them to talk to each other, and this has worked really well for Begin.com. I haven't concluded exactly the best way to do this yet. So Pub/Sub is is great. And we have a lot of tools for doing it. We've got SNS, SQS, and EventBridge. It seems to me that the kind of sweet spot is actually combining these things into like where you would maybe broadcast an event, but it hits a queue, so you know you don't lose it, because the availability and processing guarantees between these things are a little bit different, and you sort of want the — you want the guarantees of SQS probably. Like that message got delivered. Maybe more than once. And yeah, so there are ways. And generally, I would say those ways are Pub/Sub. And I would also say that this is an interesting new ground to figure out. Once it gets really sophisticated, you might want to get into proto buffers or something like that, but I don't know that devs really want to get into that. You know, we could get a lot done just JSON payloads over Pub/Sub.
Jeremy: Right. Yeah, that's actually, that's why I've been big into EventBridge lately. Because I do think that, I mean, I think they're ramping it up, and I know they get billions of messages that go through there every day. So obviously it's a very reliable service, and it's something that if we can make that part of our application, especially for cross-boundary communications, I think it would be really interesting. Of course, you do have the issue with service discovery, but...
Brian: Yeah, it's related. And I feel these are like, really the bleeding edge problems of the cloud right now, are service discovery, inter-app communication. Maybe these are just always problems too.
Jeremy: Function composition. I just talked to Rowal Udell about Step Functions for function composition, where it's great for certain asynchronous workflows, but, how much coupling do you want to create across service boundaries, and are Step Functions the right choice there, and what if you need something synchronous? But I don't know, there's just — there's a lot around that I think that causes confusion, especially when you go to that single responsibility principle.
Brian: Absolutely. You know, I think it's also speaks to AWS’ maturity in the space. You know, you look at it, and it looks like they got a lot of Pub/Sub and a lot of databases, for some reason. Shit, they even have two ways to invoke HTTP events through API Gateway. But these things have different availability guarantees and different service guarantees and different limits. And you really have to, you can't just, like, abdicate thinking about it. You really got to dig in and understand: okay, is the characteristics of SQS appropriate for this use case? Do I want, you know, this thing to retry forever? Or do I want to fail at some point, like, that kind of thing. Or, like database is another really good one. I mean, there just is not going to be a database that fits all workloads. And so Amazon has a lot of database products.
Jeremy: Although Oracle says their database can handle all of those use cases. You know, I’ve heard that lately.
Brian: Yeah. I don't pay too close attention to Oracle and the cloud.
Jeremy: Yeah, that’s probably not worth it. Alright, so just one last thing on building apps. So another thing we're seeing a lot with modern applications, is a lot of front end developers hosting static sites, right? So what's the Architect solution for that?
Brian: Yeah, this one actually came at us a little bit of a blindside. So we be built out Begin in — the earliest version of it — around late 2014, 2015, 2016. And I kind of missed the rise of the static app. I was a part of the PhoneGap team, so I was close to the rise of the single page app. But I did not participate in the rise of the static app. So when Architect was first released, a lot of people struggled using it to build things because we put a function at get / and we make it greedy. And that's a lambda function that's returning HTML. Oh, my God, why would you do that? It works really great if you server render, which you probably do want to do. But if you're content is inert and it's not changing, then you know, static sites make a lot of sense. It especially makes sense for your landing page and that kind of thing. So we re-tooled Architect in version 6 so that the root of your application is a public folder, which we greedily proxy. So anything in that folder that you compile to with, you know, Gatsby or React or whatever will be available at the root of your application. And then any functions that you add will be mounted at sub URLs so you could have post GraphQL, for example, would call a Lambda function. But you know, all your static assets could just live in public. And this seems to be the architecture that people really want these days. We don't do it ourselves. For what it's worth, we server render things through Lambda functions, which at first sounds disturbing and slow to people. But you have to remember that we put these things behind the CDN. So your Lambda function’s not getting invoked a lot. It's getting invoked maybe once a day or something like that.
Jeremy: Yeah, because you can send — because actually, that's one of the things I think maybe people don't know is that API Gateway has a CloudFront distribution in front of it, and so if you send back the right caching headers, then it will cache that for you and not hit the backend.
Brian: Yep, and CloudFront’s a great solution. You know, if we actually do the regional API thing and then we put a CloudFront just in front of the, you know, the blanket URL and get rid of that ugly /staging or /production that you get. Yeah, it works great. CloudFront takes standard caching headers. Even better if you're putting your stuff in S3 and you upload your content using our deploy. We’ll set all of the headers and content type for you on the S3 buckets so the etags comeback correctly, and you get a cache for free basically, and you know, you're going to see sub-10 millisecond responses on the majority of your content. You know, once in a while, you'll get a cold start and it'll be 200 milliseconds. It’s like, fine.
Jeremy: Still pretty good. Alright, so let's talk about some of these primitives, cause this is again, this is one of things about the Architect Framework that I really, really like. That you have, I think, what, 12 different primitive services and 12 different primitives, maybe? Maybe you can explain it better than I can. You built it.
Brian: I'm one of the people that built it. I didn't build it exclusively. There's actually quite a few people hacking on it nowadays. And I have to go to the website just to remember myself. So when we first started, you know, we were building CRUD-y web apps, and I remember we actually had a piece of graph paper out, and we were figuring out, you know, what we needed to build. And then what services that Amazon facilitated those. And so we've ballooned to 12 services, but we only used to be eight. The services are Lambda, API Gateway, S3, SNS, SQS, DynamoDB and CloudWatch events. And then there's some sort of supporting cast, services that we use that you don't really see and don't really come into play very often, but CloudFormation, obviously, Route 53, CloudFront, Parameter Store, and IAM kind of set up the supporting side, and that's it. That is the core of Architect. We just paper over those services and make it really easy to build a web app. We kind of felt that those, you know, facilitated CRUD. They facilitate background tasks that could be long running. You can put a CDN in front of this stuff. You can host static assets. You know, this is basically everything that you would need to build the majority of a web-based app today.
Jeremy: Yeah, that makes sense.
Brian: It isn't to say that you don't have those other use cases, and that's why we dump out the CloudFormation for you and let you modify it, because you will have other use cases. We've got a few macros floating around out there already. The most complex one is for uploading directly to S3. So directly uploading to S3 sounds like it should be an easy thing to do, but it turns out it takes a fair amount of CloudFormation to pull off. And so we've wrapped all that up and made it a single line directive inside of ARC. And it's a good example of not something that we built it for initially, but we were able to extend it into.
Jeremy: Nice. All right, so what about limits in AWS? Because that's one of things about AWS, which is great, is they do publish all their limits and people are like, “Oh, well this has a limit.” Yeah, well, everything has limits, but it's nice to know what those are. So does Arc sort of deal with those gracefully?
Brian: Yeah, and I feel this is worth a shout-out because this is something the other clouds don't do well. They sort of claim that they'll handle it all, and they leave it up to you to discover where their failure rates happen, when you're gonna get throttled, when you're going to overwhelm them and that kind of thing. And that's a bad experience. I want to know what the service boundaries are, so I can design my application for them. Though, the big one that you run into when you start working with CloudFormation is the resource limit of 200. We do some tricky nesting to get around that limit. Otherwise, I kind of don't feel the limits are the limiting factor anymore. There was a time when it felt like, you know, Lambda needed a lot more memory, and Lambda is going to be better once we get X. But that time has passed. We have tons of memory. The execution limits are pretty generous. And I haven't run into problems with that. A while back, I actually heard someone concerned about DynamoDB limits, which I thought was pretty laughable because DynamoDB has an extremely large amount of potential theoretical throughput, but potentially infinite storage, uh, guaranteeing single digit millisecond latency queries, like these are characteristics I have never seen in a database. I don't think there is another database with these characteristics, so yeah, it's got limits, and they publish them. But I view this as a positive and not a negative. And it helps you build a better app. You know what you're in for.
Jeremy: All right. So let's talk about DynamoDB. Because I know you mentioned that earlier. I think you gave a — I think I heard you give a talk about DynamoDB one time. So DynamoDB is sort of woven into Architect. It’s sort of the database of choice or database, you know the default database I guess you would say within the Architect Framework. Sowhy DynamoDB? Maybe let's start with that.
Brian: Yeah, I mean, it's a decision making process. And it's one that a lot of people are aren't comfortable with. It’s a managed database, which is a nice way of saying that it's a proprietary database. It's owned and run privately by Amazon. And, you know, after, our history has a, or our industry has a long history of being gun-shy of these databases because of Oracle, frankly. And I don't blame anyone for painting Amazon with that brush. “Oh, my database. That's my data. I don't want them to have that. I want to control it.” The only people that say that, by the way, are people that have never sharded a database, you've sharded a database once, you are happy to let someone else manage that for you. You are more than happy. How much does it cost? Fine. Less than a DBA. So that's going to be a good deal for me. So once you get over that initial concern, which isn't a real concern, by the way, that free tier is extremely generous. You could run a local instance of this thing yourself headlessly if you want for testing and building out locally, so you don't have this requirement of the cloud. And the free tier’s insane. I think you get something like 20GB in the free tiers. So, like you could build a lot of app with 20GB. A lot of app. You could put images in there, you don't want to, but you could. Yeah, it's a great DB. I guess the other thing that people get a little tripped up on is the syntaxes. It’s a bit strange. It’s coming out from a different world. I don't think it actually is that strange, for what it's worth. I'm pretty sure if you'd never seen SQL before and I showed it to you, you’d be like, Well, that's strange. I think just what you're used to. It's a sadly verbose query language. It takes a lot of directives in JSON form to make it do pretty trivial things. We've written a few higher level wrappers for it to make it a bit nicer to work with, but it's all about the semantics. Single digit millisecond latencies for up to a MB at a time querying, no matter how many rows I have? That's unreal. But we've never had a database that can do that. And I'm happy to pay for that capability.
Jeremy: And I think you mentioned a good point about the query language and more so about how you go about getting data in and out of DynamoDB. Because getting data in is fairly simple. You just kind of put items. But it's not always just put item, right? Sometimes it's update item, and sometimes you need to use an update item to put an item, depending on what you're doing and you want to minimize lookups. Upserting exactly. And you have to use these tricks like if_not_exists on the created date, if you don’t want that to get updated if you're overwriting a particular item and you know, there is, there's some interesting syntax, obviously the “begins_with” to select a part of the sort key. You know, there's a lot of really cool things you can do, but it’s certainly not very straightforward, in my opinion. I mean, I come from a SQL background. I’ve done SQL for 20-some-odd years. So for me, SQL is super easy, right? With DynamoDB I’m always thinking about a different way to access it. What I have to be worried about overwriting different records and then how efficient you need to be when you're grabbing data and that you really can only grab data with the primary key or a GSI, and then add composite keys to efficiently filter on that sort key, so you don't have to add filters after that. So I do think there's a big learning curve there. I've talked about this on the podcast a million times, but at the end of the day, I don't think I would want to go back to SQL, especially for most of the use cases that I have, I can use a DynamoDB table to handle that workload.
Brian: Yeah. Me too. And I'm sure there are, and actually, I want to give some props to Erica Windisch, about opening my mind on this one a while back. I was sort of kvetching about the cost of Dynamo, and we had a lot of rows for not very changing data. And they correctly were like, “Well, why don't you just dump that into S3 and use S3 Select?” I was like, lightbulb went off. I was like, “Wait a second. Can I do that?” Yeah, you can, and you can get crazy good querying speeds out of S3 and S3 Select. And this is sort of the beauty of the serverless managed database cloud in that, it's no longer a tradeoff. We don't have to put it just in Dynamo, or just in S3 or just in RDS. Why not all three? We can use Dynamo streams to pump all that data into Redshift. If we really want to use SQL to query it. If the data is historic and not changing very often, maybe just dump it in S3 and read it with batch and Select.
Jeremy: Or Athena.
Jeremy: It’s great for data that doesn't change. Any time series data, Athena is amazing.
Brian: Yeah. So I kind of think where we're at now is less about making a tradeoff choice and more about what are we opting into, for what characteristics, and when? So if the — you're absolutely right. I don't want to trivialize querying in Dynamo. It's a real — it takes a minute, and it's not the easiest to model, and you're probably going to get it wrong the first time. And that's totally okay, because your iteration speed is already 100 times better than it was before, So you're going to be able to fix it. And my recommendation to people is to just dive in, you know, maybe model it a little bit relational and feel that pain and start learning about wide column design. A neat thing about this key value store thing is that your skills with Dynamo are transferrable to Mongo and Cassandra and these other key value stores. They all model roughly the same way, where you start with the query and build out your columns from the query. So yeah, I get it. I talk to friends about this all the time, and they're like, “No, I want to use RDS,” and I'm like, “Yeah, I know.”
Jeremy: A lot of people still do. I mean, that's the other thing, too, is that a lot of companies still do the whole sharding thing, right? And certain companies — sharding is not that difficult, if you have a very good key that you can use to shard on, right? So like Slack, that's easy. It's a work space or whatever it is that they can easily just shard on. But when you build applications that there's a lot of intercommunication between them, you're going to be doing a bunch of denormalization in relational databases. So why not stick into Dynamo, do the denormalization and get the speed benefits without that overhead of doing all that sharding?
Brian: Yeah, or do both, you know, if you really do need it. I think there is a data gravity thing here too. There's a lot of not just skill investment, but actual literal rows in a database sitting there right now. And if you're in BIG CO and you've been around forever and you've got, you know, GB of data in SQL, you're not going stream that into a Dynamo table. This is going to have to be a different story for how that thing gets migrated and/or how those apps evolved into the serverless world. It's not a zero sum game. But I think if you're using Lambda and you want to play on easy mode and get the best performance characteristics, Dynamo's a no brainer.
Jeremy: So what about lock-in, right? I don't know if we mentioned this earlier, but just this idea of lockin it’s talked about all the time. Obviously, you said your skills are transferrable, but not all of your data or code might be as transferrable with Dynamo. So what's your thought on that?
Brian: Yeah, the lock in discussion. I like to dig into it when people bring it up, so, you know, there are concerns with lock and one of the concert primary concerns should be price. This is the lock-in a lot of people suffered with Oracle, where they squeeze you as the years go on and your data becomes harder to move. Amazon doesn't really raise prices. I haven't seen or heard of an instance where they do that historically last 10 years. So maybe that'll happen, but I'm not betting on it. It's a pretty competitive market, and they're really interested in margins. We know that Jeff Bezos always says your margin’s my opportunity. So I don't see database getting more expensive because I do feel this is the main anchor differentiation between clouds and right now, Dynamo’s in a really good position, so it's a little bit expensive. But as spanner and cosmos get better, where they're going to start competing on price, which Amazon is more than happy to do, so I expect price to go down. That's not really a lock-in concern. Another lock-in concern is they shut the service down. Well, Amazon’s still running SimpleDB.
Brian: So, if there's anyone on that, they don't shut things down. That's what Google does. So I'm not worried about Amazon shutting it down, so the next lock-in concern would be breaking changes. To be honest with you, I kind of wish Amazon would do some breaking changes once in a whil3, but they don't. And if you want evidence of that, go look at the S3 API. They literally have API methods that have V2 in the name of the method. Amazon only does additive change, so you're not going to suffer a breaking change. You're not going to suffer a service shutdown. You're not going to suffer price pumping, so I don't know what the objection is to lock in. Sure there's got to be another one. I'm sure someone's going to cook one up, but it's just not a rigorous argument. And for my time, the danger is picking the non-Amazon that goes away. So if my solution to lock in is to use a venture-backed third-party vendor that's privately held, then I have done some very poor risk analysis because we all know how that story goes. A privately-held venture-backed company is looking for an exit.
Jeremy: And probably and hiring somebody specifically to deal with that different type of technology or whatever, right? That's that DBA concern. It's just, you know, even if they, Amazon, did raise their prices, it’s probably going to be a lot cheaper than paying a bunch of DBAs to keep re-balancing the MongoDB cluster or the Cassandra rings or whatever, right? I mean it’s just, anyway, I totally agree with you on that. Alright, so we've been talking for a very long time. We probably could keep talking for a while, but I do want to get to Begin.com, because this is a super interesting thing.
Brian: We should do that.
Jeremy: It has been in private beta for quite some time, but you've got some news, right?
Brian: I have some news. I'm stoked to announce that Begin is now publicly available. Anyone can try it out. We have a free tier where you can deploy in an app serverlessly to AWS in 30 seconds or less. Usually takes around 10 seconds, but we have an internal benchmark of 30 seconds. It’s CI/CD, but serverlessly. So CI/CD is not news.They've been around for — it's been around forever, and there's tons of people that do it, but most of them are for traditional architectures and haven't really taken advantage of this serverless world. And so Begin is a fully ground-up cloud native serverless deployment service in CI/CD service. And as a result, our build times usually, at most, expand into a minute. Usually they're around 30 seconds. That's great lead time to production. Lead time to production is the main metric by which companies live and die and we give you an extreme advantage to that. And it's all just Architect. So you can eject any time and run on your own end of AWS. Our paid tier will target your AWS. Our free tier is running on our AWS, because we found a lot of devs and I think this is an interesting thing. But there’s a huge Amazon community, obviously, it was a huge amount of people building for the cloud. But I talked to a lot of newer devs and they're really intimidated by AWS. They don't know how to get started and they don't know where to get started, and they find it to be just so overwhelming. And it's just way easier to get started with with someone else and Begin is seeking to fix that problem. They shouldn't have to get started with someone else. They should be able to deploy straight to their Amazon within 30 seconds. That's our goal.
Jeremy: That's awesome. Alright, well, listen, Brian, thank you so much for taking the time to talk to me, sharing all of your knowledge with the community, all the open source stuff that you've done. So how can listeners find out more about you, Architect, Begin, all that stuff?
Brian: Yeah, Begin.com. It's open. So go log in with your GitHub. If you want to find me, @brianleroux on Twitter and GitHub and I usually respond pretty quickly. And if you want to learn more about Architect, you can go to arc.codes.
Jeremy: Awesome. Alright, I will get all of that into the show notes. Thanks again, Brian.
Brian: Thanks, man.