Episode #123: APIs and the Evolution of Serverless with Dorian Smiley

February 7, 2022 • 60 minutes

On this episode, Jeremy and Rebecca chat with Dorian Smiley about how Brainly used serverless to turn 4 developers into 50, how the API economy could reshape cloud architecture, what the next evolution of serverless and cloud development looks like, and so much more.

Dorian Smiley is a dedicated full-stack engineer with more than 15 years of experience.  He is currently the VP of Technology at Brainly, the world's largest peer-to-peer learning community for students, parents and teachers. Prior to joining Brainly, Dorian spent a decade with Silicon Publishing Inc., first as Sr. Software Architect, and later as its Chief Scientific Officer. His extensive professional experience includes work with cloud native applications, microservices, serverless, big data architectures, PWAs, MEAN, MERN, and LAMP stacks.

This episode is sponsored by Stream and Dexecure.

Transcript

Jeremy: Hi everyone. I'm Jeremy Daly.

Rebecca: And I'm Rebecca Marshburn and we are back.

Jeremy: We are back. It's good to see you, Rebecca. It's been a while. I think the last time I saw you was at reinvent.

Rebecca: That's almost true, but the last time I saw you was that wonderful Christmas card you sent me a view in your family. And it legitimately warmed the entire mailbox. I was so happy. So I've seen you and your whole family on my fridge for a little while now. Thank you for that.

Jeremy: Well, you're welcome. I'm just glad that the rest of my family is attractive enough to make the card look good. Because if it was just me, it probably wouldn't. But anyway, so speaking of holiday cards, how was your holiday break?

Rebecca: My holiday break was really great. I think we skipped one little thing, considering that we've been on break. We didn't tell anyone what they are listening to. If you maybe want to go ahead and let the cat out of the bag.

Jeremy: Yeah. You are listening to Serverless Chats and we are back starting 2022. We've got a whole bunch of great guests lined up, so this should be super exciting. So why don't you tell us quickly, Rebecca, about your holiday break? Because I know you did something exciting. And then we can get into the show.

Rebecca: Oh, okay. That's very sweet. Holiday break was great. I got to spend Christmas itself with my grandfather back in Virginia and then I flew directly from there to Chile and Argentina. And so I spent some time there living my past life in Spanish and seeing some old friends and doing some strange and fun and normal things in the summertime side of the world. Yeah.

Jeremy: That's amazing. That's amazing. Good for you. I did absolutely nothing other than spend some time with friends and family. But also not that bad. So anyways, let's introduce our guest. Do you want to introduce our guest?

Rebecca: I would love to introduce our guest today. Our first guest of 2022 is the VP of Technology at Brainly, Dorian Smiley. Hey Dorian. Thank you for joining us.

Dorian: Hello, hello. No problem. No problem. Excited to be here. Very excited to be here. I love the show by the way.

Jeremy: Thank you.

Rebecca: Thank you very much. So why don't you tell the audience a bit about yourself and then a bit about what Brainly does. I hear it's like the biggest ... How do they make that joke around ... Oh, I'm going to ruin this joke about someone being huge in Germany, but Brainly is huge in Poland. Anyway. I ruined the joke, but we're not going to cut it out. We're going to leave it in. It's the first joke in 2022.

Dorian: Okay. First joke in 2022. All right.

Jeremy: It can only get better from here.

Rebecca: It can only better.

Dorian: So what do I do? VP of Technology at Brainly. I spend my days figuring out ways to make engineering teams go faster. How do we scale? How do we grow this company to potentially thousands of engineers contributing to a monorepo and crank out product as fast as possible that delivers value to our customers? And that involves a lot of things from things that are consumer facing to things that are fully backend tools we use, line of business applications, all that kind of stuff.

Dorian: Brainly is an ed tech platform. We specialize in homework help, but we're quickly expanding into other areas. We have 350 million unique monthly users. I believe we're in 10 different countries. We have had tremendous success in building online communities and we continue to sort of advance that in 2022. We have some really exciting things in the works that I can't talk about here. But the future is looking very bright for Brainly and I'm generally passionate about changing education generally. For example, I work independently to advance software apprenticeships. We may have some exciting announcements at Brainly soon about that, but nontraditional tracks towards becoming a software engineer that expands the hiring pool. It includes more people who might not traditionally be able to afford a four year education, but also people who don't want to go to four year school. I, myself am self-taught. I didn't go to college for computer science. I spent 20 years learning the trade. So I'm generally passionate about nontraditional tracks in education generally. So Brainly's an amazing place to be for me. It's pretty incredible. Yeah.

Jeremy: Yeah. That's awesome. And I totally agree with you on the nontraditional tracks for learning, not only software development, but just so many cool things and so many great resources out there now for people that want to change careers. And it's funny, I don't remember who had this quote, but basically it was like, no matter what industry you're working in or what company you're working for, all companies are software companies now. So pretty much, no matter what company you work for is going to have some sort of IT, some sort of web or cloud or whatever team that is building software. And I mean, even if you're not learning how to be a computer scientist or a full on engineer, even just getting your feet wet with some of that stuff is really great. So yeah, that's good stuff.

Dorian: Absolutely. Absolutely. But that's me. So that's what I do.

Jeremy: Let's talk about Brainly a little bit because I do want to get into ... What you're doing, again, is crazy. And of course I think that if anybody was a parent or has been a parent with school-aged children over the course of the last 20, 24 months or 23 months, 22 months now, or whatever it is, certainly have a love-hate relationship with online learning probably, and that sort of Zoom calls and whatever. But the funny thing is well before the pandemic hit and we basically shifted to complete online instruction, which may not be the best for some people, but there was always these homework helper things, or math quizzes and all these kind of things that education or at least homework, a lot of that started moving online.

Jeremy: Google Classroom. All these other things of just really integrating classrooms or integrating education into the web and making it, I think, more accessible. I mean, even what was great about the pandemic was ... Nothing was great about the pandemic, but I will say that when students couldn't go to school in person, even though some kids might have been in school and some kids weren't, the kids that had to miss school for whatever reason, they could also participate online. There was a lot of that hybrid stuff going on, which I thought was an interesting approach. But anyways, let's get into what Brainly does because it's not just about a platform for learning or for homework help or building community stuff. It's actually really, really interesting from a technology standpoint, which again is one of the things we like to talk about here. So can you tell us a little bit about that infrastructure that's powering Brainly and really what are you doing there that is using serverless?

Dorian: The infrastructure question is tough because I don't know what I'm allowed to say in terms of exactly how our infrastructure is built. Like I said, we do operate in 10 different countries and that machinery is pretty complicated in terms of how we service those markets, how we provide a thin layer of customization that allows us to individualize that experience and one that is tailored for that market. I think I can generally that we heavily leverage container services. We're fully on AWS. We primarily use serverless for line of business applications today and back office applications. Common things that my teams are using serverless for would include things like our dashboards that monitor team health, that monitor pull requests, that are looking at our pipelines and that are generally used by engineers to figure out how healthy are the teams around them and how are things going.

Dorian: So that's the primary application for us. But we are also heavily looking at it for transaction processing, event streaming. Things where there isn't a low latency requirement that's in single digits or maybe 10 or 12 milliseconds in there. We have some really crazy latency requirements around our backend services that power our front end applications and often serverless falls short to meet those requirements globally for all of our users. And we just can't produce the kind of performance typically that we get out of the way we figured out how to deploy container services. So we have this really amazing backend infrastructure that some really incredible people have built over the years that is producing some tremendous results. And it shows up in our SEO scores and our core vital scores.

Dorian: We can see where that's made huge improvements. So we measure that impact and we have compared server equivalence, but I think for our customers and our consumers who use our product, where serverless will have the most impact will probably be in event streaming and transaction processing. Where we're looking to compose our future infrastructure ... And again, I don't want to get too into what we do currently, but I can say our future infrastructure will generally be more of a FaaS environment where a lot of the heavy lifting and the complicated things that take time will probably be composed out of this polyglot sort of Lambda environment where we heavily leverage step functions or maybe we write our own saga controllers to quickly compose these things. And I think that might be the future of some of the things that we're doing in the consumer space. But yeah, we're containers, we're EKS for a lot of the consumer facing things. Yeah.

Rebecca: Ooh. I had this question. I was like, well, first I really want to know why serverless, but I think you read my mind on that. However, I did have this. And so maybe I was thinking about it wrong, and I would love for you to take a moment to explain it. I was imagining that serverless would come into play, certainly that Brainly's traffic follows the peaks and spikes of a school year in many ways. And there's-

Dorian: Or day. A school day too.

Rebecca: School days rather. Yeah. And there's multiple Christmas day sorts of events for retail, but for school that happens multiple times, not only across a day, but across a year, across seasons. And so I was wondering if I was anywhere in the ballpark in terms of what were some of those discussion points that you all had around serverless and those conversational directions where you're like, "Hey, if we're trying to match these peaks and valleys, and we're trying to map them overlay them." Was that one of the discussion points or was that something that you're going to say your assumption's real wrong there?

Dorian: I think your assumption's wrong in that the limits imposed by AWS would make any discussion about serverless mute in terms of how could I plan an infrastructure in which I could deal with those peaks and valleys in a predictable way, model it ahead of time, and know what my costs are going to be? Or within plus or minus 20% or something of what I'm trying to figure out. AWS makes that virtually impossible at this scale. And also the need for shared memory is really real when you're trying to reach latencies that are in the single digits. So it's just not ... I think a lot of it was thrown out just from the standpoint of we know what our requirements are, we have a general idea of where the traffic is at and that everything with AWS would be highly uncertain if we chose serverless at that scale, if that makes sense. So we didn't even really get to the stage where we would be modeling how would we scale the system up or down. It was really like that option is off the table unless we were to somehow negotiate a deal with AWS where we would have infinite scaling or something. So yeah. That's sort of where we landed I think.

Rebecca: Before Jeremy asks his next question, I would like to point out, do a little math here. I am 0 for two. And so we're really going to try to bring up the game here a little.

Jeremy: You know what's funny is, again, I've heard this before too, especially serverless at scale is one of those things where it's great for the highs and lows. For certain traffic, it's great. But then you run into these things where you have very, very constant workloads that scale up and down. And again, FaaS gets very, very expensive. I mean, the actual compute cost that they charge you for Lambda is, I don't know, a hundred times what it would normally be if you were to manage it yourself. There's obviously benefits to having that and certainly I think from a smaller scale or for startups, setting up Kubernetes clusters and doing some of these other things are probably way, way too complex and you can't move as fast as you can.

Jeremy: And when you finally get to a scale like Brainly does, it makes sense that you've got to try to find cost optimizations. It would be great if serverless could get to that point where you had those cost optimizations. But what I heard you mention earlier when you were talking about what you're doing with serverless, it sounds like a lot of it is not just DevOps. I mean, I know you mentioned going into the idea of doing the eventing with it as well. But just in terms of those use cases that you have internally, that's another big thing that we see with serverless all the time where it's like teams say, "Well, we want to automate this or we want to do this, or we need to build a dashboard internally and we need to query five different APIs and assemble some data, whatever." That serverless is a really good tool for that.

Dorian: Absolutely. Let me try and explain the value I was able to generate for my team. So we built a serverless dashboard for the technology org and it's monitoring everything from personal metrics to how many commits people are doing and which teams are issuing the most commits, to visual regression testing, to code quality and automation, to the team health of our bug cycle time, our core web vital scores. And that is delivering huge amounts of value. It's allowing our infra teams to go out and make improvements and to find ways in which we can innovate in the technology org to build new tools that are solving real problems that we see through this. And what was awesome about that is serverless turned my four developers who are dedicated to that project into 50. Because it was like none of the infrastructure and machinery that we need to run our container services and those clusters was required to get this up and running.

Dorian: In fact, we had a lot of benefits from automation that was already built by people in our production infra teams and our office infra teams. And that's another thing that's great about serverless is that if you engineer it right, not only is your work product able to plug into these other environments ... What I mean by engineer it right in that you properly use Lambda the way it's supposed to be used, where you use layers as you would use an L7 proxy or make sure that your handler contains the details of the Lambda protocol and nothing else. You use an SDK function that takes a value object and returns a value object. Because we built all of this internal machinery that way it's able to be used downstream in ways we can't predict.

Dorian: So our functions that are doing things like returning aggregates of metrics that are in DynamoDB or that are being used for visual regression testing, those can be leveraged in pretty much any workload in Brainly whether you're running serverless or whether you're not. But I think serverless puts you in a mindset in which you disaggregate all these things and you're able to reuse the code more effectively. And so to me that is setting us up for future growth for the internal side that is going to ... It's like a springboard basically. And so, yeah, serverless delivers a whole value, not only in the tools we're able to build and how fast we're able to build them, but I also think the way it forces you as an engineer to figure out how to write your code in the most effective way possible just has value to the organization by itself. If a software auditor was to come in and look at that, you would get a plus marks. Whereas unopinionated environments like Kubernetes or ... Well, not Kubernetes necessarily. Say Docker and container services. Kind of allow the developer to do whatever they want and sometimes that's dangerous. So I find a lot of value in serverless beyond just how efficient it can make a team. It's sort of the mindset it puts you in as a developer.

Rebecca: So you had said a very nice slogan. I actually wrote it down. I was like, Jeremy, that's a nice slogan. But how it can turn four developers into 50. Right?

Dorian: Yep.

Rebecca: And I'm wondering if with those four developers plus 46, those 50, better math this time, if you could tell us about how your stack ranking some of the high level projects that you're hoping to tackle, let's say in the first half of this year. What you think they'll be focused on in terms of serverless in Brainly.

Dorian: Well, a lot of the priorities aren't being set right now by priorities that I have for my team or my org. They're being driven by more macro forces and also higher level goals the company has. I mean, to answer you, a lot of the initiatives that I wanted to get through were sort of shelved in favor of working on these things that deliver immediate value to our investors and to our users. It's hard to make the argument sometimes that they relate somehow. So I'm not saying that there isn't a benefit to our investors or our users should we prioritize these things that I initially wanted to do. It's just where the priorities are at today and where we can have the most impact. But if I were in a perfect world, like if I were to go back and look at how I set my priorities originally, I think the thing that we're most focused on is how do we get pull requests done faster?

Dorian: The pull request bottleneck is a real problem, especially at organizations like ours. And we're scaling quickly. So we're hiring new engineers all the time. They need to get onboarded quicker and the product teams are being measured by what they're delivering. And so often the bottleneck in that whole process is how good is our training? How good is our boilerplate? How good is our code automation? And how good is the PR automation that sits on top of it? And so what's interesting here is we've chosen Typescript to write most of this automation. And we use that in our GitHub actions that do the code review. There's an amazing tool from Facebook called jscodeshift that does AST analysis. And what we did is we incorporated that into our PR review to find anti patterns.

Dorian: We also have our own linters and our own linter rules we've written. But what's interesting is that we've got also this whole SDK that we built for Lambda that powers our dashboards and all these things lived together. They live in one monorepo. You're able to compose quickly. And I consider GitHub actions serverless as well. For us, it's basically identical in the patterns. And so we have this beautiful SDK that is now enabling things like zero PR review for affected code paths. So if you're writing in a code path that we determine your domain owns, you don't need our review. Because we're relying on the automation we wrote to review that code and verify it's safe. You're not doing anything you shouldn't be doing. That's a productivity boost that is just mind blowing, when you get into something like that.

Dorian: And it has immediate value to the organization, because you can say, "You remember all those problems you were having getting those PR reviews done? They're gone." We're not there yet. We're trying to get there. But I think that's ... To me, when I put it at a priority level, it's like, what gets the code through the pipeline fastest and how do we get to composable software? So those are the two things that I've kind of ranked the priorities. And so the immediate one though is code through the pipeline faster. How do we do that? How do we leverage our serverless approach with GitHub actions and how do we find reuse for that code down the road? And there might be other applications for it we haven't thought of at this point.

Dorian: But we know that the way we engineered it and the way we wrote it is that it could drop into any lambda function tomorrow. Like you could automate that incorporation. So we use the NX dev kit for all of our code automation. You could easily just write a dev kit function that ... Or a dev kit ... Yeah. Dev kit function that is going to go ahead and pre generate your boilerplate to include any one of our SDK functions or a number of them. But that's how I rank it. How much code can we get through the pipe and how well can we use this to compose software and how can we compose the system out of these functions? Composability and throughput. That's it.

Jeremy: Right. Yeah. And even unrelated to serverless, just this idea of moving code quickly through the pipeline. I mean, you always have this thing where if you've got a PR that's open for more than a day or two days, and you've got hundreds of developers committing-

Dorian: Oh, it's crazy.

Jeremy: Submitting PRs. The second that PR doesn't get merged, you're going to have to review it again and update it and merge everything back into it from the main branch or whatever you're doing. And so that ability to move code quickly, I think that's another thing too for me. And I know that Rebecca has a follow up question, but for me, this is one of the biggest things. It's like the faster you can get code into production safely, it's a huge morale boost. We talked about this with Charity Majors and we talked about it with Bryan Scanlan. Just the ability to get code to production as fast as possible is a not only just huge productivity boost, it's a huge just morale boost for your developers.

Dorian: Big time. Big time. I wrote a funny article called Tackling Pull Request Fatigue. I remember one day I was reviewing ... I don't know how many hours I spent. I was at Brainly and I think I was in hour three of PR review and I was looking at one that had 300 files plus modified. I was like, "Oh my God. This can't continue." But remember when I was telling you about certain technologies lend themselves to good developer hygiene. And thinking in a serverless mindset, how are you thinking? If you really distill it down, you're thinking in terms of isolates, you're thinking in terms of environments where your function isn't part of this environment that's persistent and you don't get to write all this sloppy code that lets things hang around and leads to problems down the road and a big ball of goo that you don't understand. You decompose problems into really simple things.

Dorian: And then you compose those simple things to make complex things. A good example of that is when you're composing something in AWS step functions from Lambda. And you can even do that in a polyglot environment because AWS doesn't care what language you wrote those functions in. Now imagine trying to do a polyglot environment in Docker with container services. It makes hard problems simple and on a certain level, it forces is the developer into a mindset where we don't get 300 files to review. We get several PRs with small bits of functionality that then gets composed up into a broader system. And this is only the result of having adopted serverless in 2017 because I saw the future for it and going through several years of like figuring this all out and coming up with patterns we know work and then realizing that, yeah, serverless was brilliant in that even in this really abstract way they're leveraging compute they've forced us into this paradigm of decompose the problem into something simple and then compose something complex out of that.

Dorian: I think there's that overlooked part of what serverless is. I remember yours, you were talking about what is serverless? What is server? What is it? They're servers. It's not like there are no servers, right? What is it? To me, it's this really interesting way to promote simplicity in software design. And FaaS is really a reflection of that. It's like decompose the problem into composable parts. Well, that's FaaS. It's just we didn't have a run time to do that for a long time. Serverless kind of gave us that run time.

Rebecca: You had said it yourself. You've read my mind again, but I loved your post around PR fatigue. And what I think is interesting too is talking about this idea of productivity boost. I think a lot of people who are listening are going to be in your similar space in terms of we're at the beginning of the year, we see a way to get to very specific goals. Maybe those goals do or do not end up being taken on the table or some of them end up being shelved, et cetera, et cetera. But I do think this idea around productivity boost, it's not just ... To eliminate PR fatigue is not one of the inputs. It's like, you end up making people more productive, not because they have a better tool to make the input, but because it eliminates the thing that destroys their inner spirit on the output.

Dorian: Oh yeah.

Rebecca: Right?

Dorian: Totally.

Rebecca: And so there's actually a way to boost productivity, not about how do we give you a better keyboard or how do we make sure that your code environment is easier to read or whatever it is. It's really about that output of not being like, then I finish this thing and I'm deflated. And so I'm wondering how to expand this question into those ideas around what ... I'm sure you framed a lot of things like, "Hey, how productive can we make our developers and how productive do we want this engineering and development team to be?" How have some of those conversations ... When you have been able to implement the things or get them on the roadmap for 2022, what have some of the successful conversations gone like? And I'm curious because I think a lot of people are probably in your similar position where they're like, "We are missing out on these very productive moments that actually empower the team to do way better for the next 12 months to feel better, to stay longer, to be more excited about their work, et cetera." And so I don't know if you can talk about any of those conversational points or maybe at a high level.

Dorian: Oh yeah.

Rebecca: Moments that have been super successful.

Dorian: Yeah. At a high level I can, for sure. The first is have a fantastic boss. That's priority number one because they're going to help give you the freedom to do what they know needs to be done. And so just have a good boss and make sure that you communicate the value of these things articulately and can measure it. The other side to it is the reason I built a lot of these tools that we use today is so that I can actually deliver real world value that's measurable. That's the other thing as a developer, you have to find KPIs that show the improvement. I can drill down into the day and show my boss where our improvements are pushing more PRs through the pipe. I can show him where we're catching visual regression errors related to a style guide change.

Dorian: And I have the developers on record stating this. We have a lot of evidence to show the value and then your wonderful boss will go out there and fight the battles for you to get a lot of that. And I think a lot of it is that. It's relationships. So you've got to have the relationships and you've got to be able to measure the value and show it. It can't be hypothetical. But once you do that, I think that this concept you touched on, which is related to that albatross around the developer's neck of having to come into work every day. Its like you're sitting down reviewing 500 affected file PRs and you've got to do that for two hours.

Dorian: Who wants to do that? No one wants to come in and do that. Don't we have machines to do that for me yet? That's not fun work and it's deflating. But also what's not fun work and deflating is working in a stack that is just completely messed up. You go into a big ball of goo and every time you touch a line of code, you've broken something. That's one of the most deflating, demotivating things as a developer as well. And so when you pitch serverless integration and some of these design patterns I'm talking about, you also have to be focused on what is the ecosystem I'm creating in which developers are going to have a good experience. And what I find is that this whole concept of how we are moving towards building complex systems out of the lowest common denominator, which are the functions, and then on the front end, we're building out of components and into modules and into applications.

Dorian: And in serverless, for our workloads, we're building out of functions into step function workflows into pipelines like event streaming. That's sort of how we're building complexity in the serverless on the back end side. But that builds this ecosystem now where people are excited to come to work. It's like they can actually affect a line of code and we can test it and know it works and they don't have to worry about the surrounding system after that. There are no downstream effects that we can't effectively test for or measure. I mean, there always are, but in theory. It's a lot better than the big ball of goo that's got five billion lines of code and no one knows how it all relates and if you change something the only way you know it broke is to deploy it to production.

Dorian: It's kind of like around that old quote from Rich Hickey that's like simplicity is hard work. Simplicity is hard work. It's really hard. And it's especially hard when you hand developers a tool that's unopinionated about how you build software. And often I see that's the downfall is organizations try to scale to support dozens of developers. And I'm talking that's a small scale, dozens of developers. That's why they often hit the wall, because they've never experienced a framework, like say the serverless framework or let's just say a custom homegrown framework within a monorepo that is opinionated about an ecosystem that you're working in. And that's what I love about what you're working on Jeremy is that it's going to have an opinion.

Jeremy: Right. Right. And so speaking of ecosystems and maybe even opinionated piece of this is I think one of the things we've seen a huge evolution of over the last several years are managed services. And most of those managed services are essentially backed by, or fronted I should say, by an API. And APIs are now encapsulating all kinds of functionality, including full on data functionality. If you think about DynamoDB, you're not backing that data up, you're not worrying about scaling those servers. You're just making an API call and literally everything else happens behind the scenes for you. You've got Fauna, you've go MongoDB Atlas that's launching a new serverless version. So you've got all these different APIs that are really forming almost where it's like you kind of only need glue code in some cases to just kind of stitch a few different things together. So I know you wrote an article too, speaking of articles. You've written about the API economy, essentially, leapfrogging serverless. So just give us the premise of that article and maybe some of your thoughts on it.

Dorian: The premise is that serverless in reality is AWS specific. It hasn't yet delivered on the abstraction between all of ... It's not multi-cloud and it's not edge. It's really about AWS. And you still have to glue together a whole bunch of things to make an architecture. So I still need to glue together my lambdas. I still need to write cloud formation for a lot of things because not everything is written into an open source come component that abstracts away that complexity. So I'm still writing a whole bunch of AWS control plane specific stuff. And along with a whole bunch of other things that I may compose incorrectly, that I may set security inappropriately around. We can get into arguments about whether resource level permissions are appropriate or you should be using something like STS always and you should never assign resource level permissions to anything in AWS.

Dorian: We can argue about that all day. But the point is that there is a lot of work that goes into that. I mean, we're spending more time doing that than we are writing the code. I could write a function in 30 seconds. Especially if you have a generator sitting on top of it. But then I spend three days troubleshooting why it doesn't work when I deploy it to AWS. And that's even if you're using the serverless framework and good abstraction tools. And it's still not multi-cloud. But if I were to think of what is the perfect way I could abstract that away? Well, one of them's an API. And especially if you think of something like Fauna, right?

Dorian: Where you have this distributed database, it's amazing, it does what I want to do. Or something Dgraph where if I want GraphQL and an actual graph database. Who would've thought that would be a thing? Of course it's a thing. But I don't need to worry about how they deploy their infrastructure and they could actually solve the multicloud problem and I would never know it. I could write my code once and I have the abstraction built in. I'm just making an API call. And I don't need to spend any time writing infrastructure as code. And I could actually build an app on Versal and be up and running in a global edge compute network in an hour without one line of AWS control plane code. Which I would love. That's sort of like my guiding light.

Dorian: If I can write an app without any AWS control plane specific code, I've achieved something important. And I think this is something AWS has got to wake up to. There's this opportunity cost that's racking up in their balance sheet right now. And they don't seem to understand it, but people are ... I don't think they're going to suffer that much because the people building these services are often the abstraction layer is built on top of AWS. To them, that's a win. They're still going to get the business. What I think they need to pay attention to is that these services that no one is going to use because they're too complicated to deal with their garbage control plane are going to be like this albatross around their neck or this weight sinking them down because they can't just spin those things off tomorrow.

Dorian: There are a lot of people that depend on those things. But they're going to wake up one day and realize that companies that are solely dedicated to solving those problems deliver a better developer experience and a better service than their five person team that spends a lot of time listening to businesses but never spent one day talking to the developer that uses it. That's why I think the API economy is going to leapfrog serverless. It's the developer's stupid. If you ignore developer experience, you will fail. 100%. I don't care what you do. Eventually you're going to fail. Someone will come along and crush you. But that's why I think it will leapfrog is just that you're able to offer this incredible developer experience and accelerate the outcome by an order of magnitude. It's mind blowing, right? And the people working on it are amazing. All the people working in this space of the API economy are just like me. They've had similar experiences. They thought of this years ago. And they want to enable a good developer experience. And so to me that says they're going to win. At least in my mind.

Rebecca: You have this great quote that I'm just going to read directly because wouldn't be able to put it better myself. I might read drift comma, on a budgeted spending comma ... No, I'm kidding. I'll read it like a regular person. "Drift unbudgeted spending, defective design, and a failure to read the fine print of service limits are a few examples of how IAC can become a countermeasure to server a success." I think that's the most condensed version of what you were just telling us about. These moments are like the countermeasures. And I'm wondering if you see any path toward reversing that for IAC. So it sounds like a API economy is going to leapfrog, but is there a path where you're like actually, if you did take step one, step two, step three, these are the ways that I see this coming back into play?

Dorian: Great question. Yeah, totally. No, I think there is. I actually met with Austin a long time ago and I told him this directly. I was like, "What you need is reference architectures that are easily deployable that solve a use case." So I need event streaming because I want to monitor telemetry data. That's what I want to deploy. And I don't want to configure anything. I want it pre-configured with good defaults. And so that would be an example of where you could use serverless and build an infrastructure internally without using an API. You're still going to probably get an API at the end of the day. That'll probably still be your entry point. But you can just go through a catalog and select the reference architecture for the use case you're trying to solve. I need video streaming. That's another one.

Dorian: Or like I need a ... I mean, obviously we already have solutions for single page web applications, but the point being is I need this composable system where I can get a best practice reference architecture that solves real use cases and the defaults are turned on so the developer doesn't have to think about it. And that if there's changes that need to be made to that architecture, they can be rolled out seamlessly without breaking the contract between the thing that's consuming that reference architecture so that you're not causing the drift problems. But that's a hard problem to solve. It's not easy to solve that problem. And especially if you're thinking multicloud, that's an even bigger problem.

Jeremy: Right, right. Because your infrastructure is code, which again, people push for. And I think infrastructure as code is the best sort of scale solution we have right now, or the best widely accepted solution we have right now where you get repeatable builds. You certainly don't want people going and configuring things with the console. But IAC, like you said, is essentially you're writing AWS control plane or Azure control plane, specific configurations. You're picking those primitives. And code is a liability. We talked about this with Ajay Nair when we had our live show at Reinvent. And basically that's code is liability. The more code you have to write and the more things you have to do. And then again, as things change, as they add new features, as Lambda adds a new switch to do something else, all of your code doesn't have it. So now you default to whatever their default is and then you might have to go back and update thousands and thousands of little snippets in your IAC code, just so that you can go in and add this new feature or maybe even have a different behavior that may have changed by launching something new.

Dorian: Oh my God. Yeah. Reinvent just happened and then I'm still getting through the videos at this point of things I want to watch. But sometimes you just go, dude, AWS has run five laps around me. And then they launched, I don't know how many new ML related things that I'm still learning about. And I don't know if they realize they're just outpacing the developer and our problems haven't changed. The problems I'm trying to solve are the same they were five years ago. But you're releasing 50 new services that all have overlapping functionality and which one do I choose? EventBridge would be a good example. Do I use SNS? Do I use SQS? Do I use event bridge? I don't know, because no one's really [crosstalk 00:37:38]-

Jeremy: Look at some of the new patterns. I don't want to interrupt you, but some of the new patterns, just because that hits a nerve with me, because a lot of the new patterns are like use EventBridge with SNS and SQS. Like you add all these things to ... Just keep composing and composing more services.

Dorian: How about using RabbitMQ? I'm going to use RabbitMQ. Forget it.

Jeremy: RabbitMQ pushes to EventBridge, which sends off an SNS that goes SQS that then is processed by a lambda. And then yeah, I mean the architectures get complex and like you said, they overlap.

Dorian: Exactly. And so you really need a team of professionals who specialize in that. I mean, that's the beauty of the serverless framework. I mean you can get a company that's supposed to specialize in this. Developers can't be that. I train developers. It's part of what I do. But I could spend the next 10 years training them on AWS and I'll lose all of the productivity relating them training to our business domain and delivering value there. So why would I focus my time on a valuable resource that needs to deliver growth to our investors on delivering growth opportunities for AWS? I'm not going to. I'm going to find every workaround I can to make sure that every minute that that developer's spending is delivering actual value for Brainly. And it's not infrastructure as code. So yeah.

Rebecca: I think we're getting to this moment of this whole topic that we wanted to talk to you about, which is what's the next evolution, not just of serverless ... Partially of serverless of course. It's Serverless Chats. It's in our name. But also cloud development. And I think part of that is sometimes to move forward, you kind of need to look back and it sounds like in a way the future of cloud development should also be solving my problem that I still had five years ago.

Dorian: Absolutely.

Rebecca: I think you articulately described the problem statement as you would put it in your own words to us. And so instead of taking those, I would love for you to frame the problem statement as you see it in terms of where serverless and cloud development should go next. And then we'll dive into more topics around that.

Dorian: All right. If we start at the lowest value stage, which is sort of how we build software. And I think we've nailed that in the sense that we have a best practice way. We can compose functions into a way in which we can do complicated things. And I think we've known that for a long time whether you look at sagas or whether you look at step functions and state machines. Interesting side note, Brainly is hugely adopting finite state machines across all layers of its stack, front end, back end. This is something that's going to in my opinion, make a huge comeback as we really embrace this idea of composing functions. You've got to have some way to control those things. So at the lowest level, it's really the design patterns that we've had since the '70s and appropriately applying those and making sure that we don't get ourselves into the big ball of goo problem.

Dorian: So that's probably the lowest level, but the next level up from that is this idea of how do I bypass the control plane? And I think that's where the innovation is going to be. And so I can give you an example of a company you probably would never think of that would be creating something like this and they're not even using serverless technology. And the only reason I want to talk about this is because it's an insight into where things might go as we try to solve the problem of how do I bypass the control plane and get this really beautiful architecture that we spend a lot of time figuring out that allows us to compose software? How do we get that to run everywhere? Because that we really have to think of this.

Dorian: If you distill it down, when I say run everywhere, I do mean run everywhere. I mean, there is a edge network being built around the globe right now that involves code being able to run at almost any place. Within cities, like within data centers or limited to cities, or whether it's running on cell networks or whether it's running on drones or whether it's running-

Jeremy: Cell towers. Right. Yeah. It's crazy.

Dorian: How about this example? StackBlitz, right? They have web containers. Is your laptop now an edge deployment? I don't know. Maybe. It could be. So the problem is growing at this crazy exponential rate and we want our code to run everywhere and we don't want a control plane in the middle of that. But a company that has thought about this is Palantir. And Palantir has this product called Apollo.

Dorian: And when I was looking at their diagrams and how they made it work, they inverted control over the CICD system. So a typical CICD is going to contain the details about how you push code out to a particular environment. What they decided to do was invert control and they created these processes that run in those specific deployment targets that contain the protocol of how they ship software. So they then eliminate all of the need to centralize all of that detail about the deployment target into these agents that are running within the deployment targets out there on the edge and out there in these various clouds and they have achieved multi-cloud and edge using that approach. That inversion of control is genius. And I think we could look to ways we could apply a similar type of solution to run serverless code.

Dorian: They're like little mini run times that have figured out how to integrate with the CICD system and they're using basically a polling method to figure out when they should pull changes, as opposed to this push system that we have built all of our CICD systems around. So I think it's going to take an innovation like that, where we decentralize the model. Because right now the centralized model of ack and CICD just isn't going to scale to that level, if that makes sense. I don't know if that sort of makes sense on where I see things going.

Jeremy: No, it does. And actually, I mean, this is one of the things where you think about cognitive load for developers and you first start talking about you just check some code into GitHub and some process will run and it will magically appear somewhere and maybe you'll get some errors in century or something and you can debug it later. But then you switch over to this idea of serverless and you're like, well, you're actually now deploying, not just this snippet of code that runs, that may be part of a larger orchestration system or part of a choreographed eventing system or whatever. But now you're also putting connections in between the code where you say, well, if this SQS fails, then there's a retry and there's a DLQ and you've got all these other sort of crazy things going there.

Jeremy: So that's a huge cognitive leap right there. Now you start saying, oh, by the way, you're deploying to 5,000 edge nodes with every deployment that you do. And by the way, it's going to have to be geographically aware and then you might have some data requirements where only some data can be stored here and you can't replicate it there. And knowing, and then whether or not it fully deploys. It's insane. We haven't even gotten to the point where people fully understand the serverless mindset or whatever you want to call it. Getting them into an edge mindset is even crazier. But I'm just curious in terms of ... I get what you're saying but I'm-

Dorian: That's why decomposition's important, right?

Jeremy: Right.

Dorian: But you got to decompose the problem. And the way you do that is you make separate companies that worry about specific domains of that problem. And then your team is interacting in that API economy and in that ecosystem that has now emerged with lots of independent companies that focus on that one little aspect of making that work across an edge network. And they distill it down into something that you can consume and compose into your system because developers are still ... At the end of the day, they need to focus on the use case for their app. And that's the only way we're going to get there is through decomposition of the problem. So that's it. But this concept of leveraging the edge network and deploying everywhere, no one's figured it out. There is no solution to that.

Dorian: There isn't even a solution to the 2017 problem of multi-cloud yet. Anywhere. Whether it's Vendia, whether it's the serverless framework. I don't care who you are. No one has figured out how to do that yet, because it's really complicated. And even Terraform has gaps in there. And it's debatable even to the extent you could ever have an abstraction layer at the rate the cloud providers move. So I think that we have to embrace this idea that we need separate entities that solve these problems individually, and that you need to invert control to basically say you build a protocol that is deployed within these various structures and you realize that protocol in the form of a container that runs and it's going to encapsulate the details of how to deploy that thing. Rather than trying to centralize it into your CICD system. I just don't think that's going to work in the long run.

Jeremy: Right. And then you have the idea of having specialists that can apply best practices to those specific pieces of the use cases, and then maybe a layer on top of that that says, take these\ APIs, assemble them into another best practice for a use case and kind of stick all that stuff together.

Dorian: Yeah. We need an ecosystem that's opinionated about how you do stuff. I think at the end of the day, that's really what we're saying is the way we operate today, there's no opinions, it's up to you to kind of figure out best practice. You're using a lot of-

Jeremy: You're reinventing the wheel.

Dorian: Yeah.

Jeremy: You're reinventing the wheel. Developers reinvent the wheel all the time. It drives me crazy.

Dorian: Every day.

Jeremy: Yep.

Dorian: Every day. And then you're also building these things and then AWS, like you mentioned, changes the rules or builds a new service that's supposed to replace this thing. It's just this never ending problem. And as a developer, we can't be focused on those things. We need to focus on the things like delivering value to our users. And so I do think that this API economy is going to win. I see no world in which infrastructure as code sort of remains in the stack the way it does today five years from now. But in the interim, it's sort of like, what do you do? And what I've been doing is sort of embracing the lowest value stage on AWS possible, which is why we run a lot of stuff in container services already.

Dorian: Like, how do we properly leverage open source and just shift the conversation out of AWS until this better economy, better way of doing things eventually emerges? And if you're running a startup, what I would say is use Versal and use Fauna and use Shopify. Don't ever write a line of IAC because you're going to be in this incredibly advantageous place while all the rest of us are trying to figure out how to get there. Or having to build open source stacks that run on EC2 or something like that. But I think that the future, once we do get there for serverless is just it is the default because the future is the edge and you will compose software. You could never have a company that had enough domain knowledge to figure out that edge network. So it's going to have to emerge that way.

Rebecca: So before we go any further, I definitely want to ask, I really appreciate how you're like, "All right. Use Versa, use Fauna, use Shopify." And so I think there are two things that I appreciate how you write about when you evaluating a service or when you're evaluating something new. And you'll also say, "Hey, here's what it's supposed to be good for and then I'll come back when we've evaluated it further." So I think there's one part of the question, which is like, how do you and your team ... How do you create those evaluations? What are you looking at? And then the second question specifically about GitHub code spaces, which is something that you said, "Hey, I'll come back when we decide more about it." It's described as a way to eliminate the works on my machine problem. And in my head, that's going to kill a lot of really great memes. So I want to know what that evaluation came out like. What's your take on GitHub code spaces. And are you okay with those memes becoming a thing of the past?

Dorian: Awesome question. Our process at Brainly is we create this RFC document and we open a Slack channel and then we invite all the key stakeholders. And in the RFC we summarize everything that we've figured out from the research project. And then we do a vote up or down on whether to adopt the technology. And so that's the general process and it's worked out pretty well. I mean, you could argue that that may slow the rate of you adopting things, but at the same time, it also prohibits you from the problem of adopting things that don't work, which is often a bigger problem. But that's our process. But with GitHub code spaces in particular, the problems we're looking to solve ... I have a couple of them. But the it works on my machine problem is probably the number one source of preventable slowdowns we get on Slack every day.

Dorian: It's like some tool that doesn't quite ... A dependency's broken or whatever. So environments have been a big problem. What are the ways we've tried to solve that in the past? People have built machine images. That's kind of a no go today for small organizations that don't have massive IT teams to maintain those things, keep them in sync. Other people have tried to write automation and bash or other types of scripting technologies that try to keep the environment in sync, but even the bash scripts have the it doesn't work on my machine problem. So that hasn't really been a solution. And so when I saw ... Virtual IDEs aren't new. Cloud9's been around for a while, but the developer experience was pretty poor in Cloud9. Anything that's going to basically run a video stream over the network isn't ideal.

Dorian: And the beautiful thing about code spaces is they've split the IDE into the part that runs in the browser in the client, and then the virtual environment that's doing the heavily lifting in the background. That's the right model. You get really good performance from a developer perspective as you're writing code and it feels pretty native. It doesn't feel like you're running on this VM somewhere and it sucks. So I think they did a good job there. The environments aren't using an effective caching. And what I mean by that is with StackBlitz, web containers installs are almost instant. And they've figured out this really sophisticated caching system to make that work. And with code spaces, it's slow. Like as slow as it would be on your workstation. And the environments aren't pre-warmed, so a lot of times if your environment will shut down or you're coming back to the environment, it takes a while for it to boot up and get rolling.

Dorian: So that's kind of frustrating. I don't ever turn my IDE off pretty much. And when I sit down it's instant. I can just start writing my code. Having to wait all this time, that's actually a problem. Multiply that problem times a hundred people. My productivity has gone down, not up. That's a serious problem. The other with code spaces is there's still problems with it doesn't work on my machine. And the reason is is that their security model is bad. They decided to leverage SSH keys and dot MPM RC files to do authorization. So now I still have to configure a certain amount of local setup. And because the VM is scoped to the local developer, they can do whatever they want on there. They can install all kinds of dependencies you didn't know were on there.

Rebecca: But save the memes.

Dorian: You've still created-

Rebecca: Save the memes.

Dorian: You didn't solve the problem. Right? Yeah. So okay, we've just gone around the block. We made four right turns and we're back to where we started. So that was my impression.

Jeremy: That's where I wonder too. Because again, with serverless cloud, we took this approach where, look, use a local machine and write your local code and then we'll sync that code into an actual production ready or production compatible instance where you're actually running code as it will be or in an environment where it will be the same as when you publish that to production. And SAM Accelerate, for example, has done this thing now where they do something similar where they're uploading code. Of course, you have to create the environments yourself. It's not quite as easy of a workflow, but same idea. So I'm curious though, with something like code spaces and StackBlitz, which is awesome, by the way. I played around with it. It was very cool. But you're right. You're kind of creating your own thing that you can really mess up and isn't necessarily what's going to be what actually gets produced. But I mean, is there a happy medium? Do you think that these online IDEs can help with productivity? Can they solve some of these problems? Do you think people still need local? They still want to use all those local tools? I mean, what's your thought on that?

Dorian: I think the local IDE is king and it's going to be king for quite a long time for hardcore engineers that spend 12 hours a day writing code. I just don't see that going away anytime soon. And it's because ... Think how much time I spend ... Our teams spend a lot of time just writing documentation on developer hygiene about what keyboard shortcuts you need to know how to use and what IDE plugins you are required to use. It's all to maximize the throughput of the code. But I mean, minutes matter. And so I just don't see a world in which you can get the same productivity. StackBlitz is the leader. So if there is a IDE that is web-based, that could actually solve this problem, it's probably going to be StackBlitz.

Dorian: And they were smart too, because they have an integrated browser. So that was the absolute deal killer with code spaces. They don't use an integrated browser so hot code pushes are being spit out to this remote URL that you're on, but it can take 30 seconds for it to show up. And sometimes it doesn't show up. Versus a half a millisecond. And it always shows up on my local. I can't even get to the browser tab fast enough to see it refresh when I'm using a local IDE. And StackBlitz has figured out how to do that right so that you don't lose that productivity. Now again, imagine that productivity slowdown multiplied by a hundred people. You're talking about a very serious productivity loss. So yeah, I don't think that the virtual IDE is useful there, but where I do see it useful, and this is still something I'm exploring, is for bug triage.

Dorian: So like, how do we get a environment that reproduces a bug? That's a good one. And it also allows you to leverage people who may be junior software engineers going through an apprenticeship program to learn your code base and how you build software, an opportunity to provide actionable bug reports to people so that your bugs aren't sitting in the backlog for two months or three months or whatever it is. Because no one knows how to reproduce the issue. So I think that that's a really good use case for a online IDE. That sort of thing.

Rebecca: I think there are two M's that matter a lot. Minutes matter and metrics matter. And also love matters, but it's not an M.

Dorian: Right, right, right.

Rebecca: Well, gentlemen-

Jeremy: Minutes, metrics, and love.

Rebecca: Minutes, metrics, and love.

Jeremy: That could be a movie.

Dorian: There you go. That's the new developer T-shirt and sticker, man.

Jeremy: It's probably already green lit on Netflix.

Dorian: There you go.

Rebecca: Somewhere Netflix is like, you got to take this podcast down. We trademarked that. Copywritten.

Dorian: Oh god.

Rebecca: Well, gentlemen, we have arrived at the end of our very first podcast of 2022 with Dorian Smiley. Thank you so much for being here and sharing your knowledge with the community. How can our listeners find out more about you?

Dorian: I'm on LinkedIn. That's about all the social media I do. And Medium. Find me on Medium. Come visit our Brainly tech blog and brainly.com. Look at our careers. We're hiring. If any of this sounds fun to you, come check us out. But yeah, Medium and LinkedIn.

Rebecca: And GitHub too. You got your own /doriansmiley.

Dorian: Yes. GitHub. I'm all over it. Yep.

Rebecca: Yeah. Like don't forget that.

Dorian: GitHub. I almost forgot. Yes, I'm on GitHub.

Rebecca: Awesome. Well, thank you so much. And we will put all your info in this show notes as well. So people can just click on that link and go right to you.

Dorian: Awesome. Thank you guys. I really appreciate it.

Rebecca: It's been fun.

Jeremy: Thanks Dorian. Have a good one.

Dorian: Bye.