Episode #124: Self-Provisioning Runtimes with Shawn "swyx" Wang

February 14, 2022 • 65 minutes

On this episode, Jeremy and Rebecca chat with Shawn "swyx" Wang about workflows as code with Temporal, self-provisioning runtimes, the intersection of cloud and serverless, the need for developer experience roles, and so much more.

Shawn “Swyx” Wang is currently Head of DX at Temporal.io, based out of Seattle. He is also a frequent writer and speaker best known for the Learn in Public movement and recently published The Coding Career Handbook with more advice for engineers going from Junior to Senior.


This episode is sponsored by Stream.

Transcript

Jeremy: Hi, everyone. I am Jeremy Daly.

Rebecca: And I'm Rebecca Marshburn.

Jeremy: And this is Serverless Chats. Hey, Rebecca. How are you doing?

Rebecca: I am doing well, but I feel like you always ask me that first. So, I'm going to say, Jeremy, how are you doing?

Jeremy: I am excited. I am extremely busy lately. We've been in a content creation, just trying to get content out, and it's been crazy, but I'm excited for our guest today.

Rebecca: I know you are. To give listeners a sneak peek, I have never heard Jeremy talk so fast and we all know that's really impressive for him.

Jeremy: That's super fast, right?

Rebecca: It's really exciting, and without further ado, really excited to introduce our guest today who is the Head of Developer Experience at Temporal Technologies, Shawn "swyx" Wang. That being said, he's going to be here to talk to us about the holy grail and what that means at the intersection of two very special parts of a very special blog post and all these other things that we're going to get into. Hey, Shawn. Thank you so much for joining us.

Shawn: Oh, thanks for having me. I'm excited to be on. I've been a long time listener, so excited to be a guest for the first time.

Rebecca: Ooh, well, we're excited to have you here. Sorry, Jeremy.

Jeremy: Yes, we are. No, that's all right. No, I was just going to jump in. I mean, honestly, nothing against other guests. They all have been fabulous, but I am super excited to have you here today, Shawn, because I think we share a lot of the same philosophies. We'll get into some of those things, but before we get into this and I dominate the conversation with what I want to talk about, let's talk about you for a second. So, just in case listeners don't know you, which I think is, that's just an absurd statement, but if they don't tell us a little bit about yourself, and what Temporal does.

Shawn: Yeah, sure. So, I'm Shawn and Head of Developer experience at Temporal. And previously before that, which is probably going to come up in the near future in this conversation I worked at Netlify and AWS as well. Mostly on the serverless and JAMstack side of things. Temporal is a workflow engine, which I never really thought is a thing that I would need until I thought deeper about the kind of work that really requires high reliability. So, really, a workflow engine essentially does anything long running and it does... It's responsible for microservice orchestration or serverless function orchestration, doesn't really matter. Just the general purpose like what's going on? Where is the work being carried out right now? Okay, that work is done, what needs to happen next? That sort of long running job processing.

Shawn: I think it's a general pattern that once you see it, you've seen that most people have built some version of the is in some kind of ad hoc manner most of their lives. And this is the most advanced framework that I've found basically because it came out of Uber where it was developed to serve Uber's needs and everything from messaging to Uber Eats to driver onboarding, and basically like 300 something use cases at Uber. And then it was open sourced and it got really good adoption at places like Airbnb and Coinbase and DoorDash and Snapchat and Stripe and Netflix and on and on. And so, that's why I was really bought over by it because it could possibly be that this is a better mouse trap. I'm just really excited by that sort of thing. So, I could go on about the why about it, but that's the long and short of it. It's a workflow engine that lets you write workflows as code, and I think the melding of infrastructure and languages is why we're here today.

Jeremy: Yeah. No, and actually, I want you to go on about it because this idea of workflows as code I think is amazing. I mean, we've got a couple different services out there. There's probably hundreds of services that we don't know about out there that are doing similar things. But I think the ones that come to mind for me that I compare Temporal to at least, I guess, from an execution point would be like durable functions with Azure, and Step Functions with AWS. Of course, durable functions are, again, I guess, workflows as code as well. You write them within a function to orchestrate different things. Whereas, with Step Functions are more of a DSL and you've got to have a specialized language in order to do that, but talk more about workflows as code and why that makes more sense because I like Step Functions.

Jeremy: I mean, I really like Step Functions. They do... The service itself is great. The interface to it though. I mean, essentially they had to build a workflow studio or whatever they call it because it's not easy necessarily to write the DSL for it. It's not as intuitive, I think, especially to developers, which I mean, we get into the whole point of developers now building infrastructure or provisioning infrastructure. But I think that for developers, the idea of using it as code makes more sense, which is why I actually really like durable functions, which in turn is why I love what Temporal's doing.

Shawn: Yeah. I should start off by noting the intellectual history here. So, it's really, really a small world here because there's only a small set of people who have been doing this for two decades at this point. So, our CEO was actually the original tech lead for AWS SQS back in the day. And then the tech lead for AWS Simple Workflow, SWF, which was the precursor to AWS Step Functions. So there's that intellectual line of history. And then there's our CTO wrote Durable Task Framework, which became the basis of Azure Durable Functions, so both our CEO and CTO, both of them came from this background and then did it all over again at Uber and felt like they finally found the right abstraction, which became Temporal today. So, that's the historic context.

Shawn: I love this history because it's always people behind the technology. And once you understand superficially, you're looking at docs, you're looking at API design, and actually there's a people story behind this. So, I love telling that story. But yeah, why model durable functions? Why model processes and long running processes as code? Basically, I think, for me it's an argument by language design, right? Like right now, if you're using some kind of domain specific language like AWS Step Functions, Google Works also has a similar arbitrary language that they've designed for themselves. You're basically limited by what the designers of that language have allowed you to do and you have to... So there's a learning curve cost, but also there's just, it's a system restrictions where you have to get permission from someone to model some kind of business logic that you can do.

Shawn: And right now our point of view is that this kind of orchestration code is basically completely arbitrary, and you need the full expressiveness of a touring, complete language to do that. And so, either you can wait for someone to invent their way into a touring complete language and have you write the AST in some kind of workflow studio or JSON or YAML, very verbose config language, or you can use a programming language that you're familiar with, use libraries and frameworks that you are familiar with as well, test them and version control them in Syntax highlight and Lint and all the other stuff. Use programming languages, use general purpose programing languages to write general purpose business or business logic, and that's where we are at with Temporal.

Jeremy: Yeah, and I want to let Rebecca get a word in here because I don't want to dominate this conversation.

Rebecca: I am along for the ride today. I am excited.

Jeremy: But I love this idea, like what you said this being able to express it in something that's more imperative type code, right? It's something you're familiar with. And honestly, I think this is where people get confused and we'll get more into interpreting intent or interpreting the run times and things like that. But this idea of saying that, well, you shouldn't orchestrate these complex things in code because something could happen there. I think you're missing the point. The code itself is just a way to express what you want to do. It doesn't mean that's the way that it actually gets implemented behind the scenes, that you actually have to run the server somewhere and run through this and hope it doesn't break. It's just about expressing that intent.

Shawn: Yeah, totally. There's nothing else I can add to that because that's exactly what we do behind the scenes.

Rebecca: So, the name of our podcast is Serverless Chats, which by no way compels us to talk only about serverless. But I think what is really exciting is we like to talk about obstructions and how technology has come far enough today that allow us to build the things we really want to focus on. But I also want to get back to the tactical just for our listeners to help level set for them. If you could talk a bit about how you all think about and use serverless at Temporal and whether or not, how some of those conversations evolved, or if it's more of like a philosophical way of thinking or some of the tactical ways that you also apply it while there?

Shawn: Yeah. We're not opinionated about that. So, serverless, we ourselves are not serverless, but I view Temporal as the single state full service in your entire tech stack that lets everything else be stateless. And that makes for very wonderful serverless properties. It lends itself very, very nicely to serverless properties, but it also lends itself nicely to a microservice architecture. We don't actually care about the size of the service, whether it's a macro service or a nano function, whatever the kids are calling it these days. The point being that you're going to have a bunch of different teams working on different functions all over the place, and they need to be coordinated. Or in other words, orchestrated one after the other, parallel. You need to branch out work. You need to join up work. You need to block on one piece of work being done before the other completes. You need a framework to organize all of that, and to make it easy to version and test and migrate all of that logic as well.

Shawn: So, for me another... This is me again with the narrative storytelling, the way that I found my way to this company was I was working at AWS and I was essentially a salesperson for AWS's serverless capabilities. I was at AWS Amplify, which is entirely... It's serverless plus JAMstack and [inaudible 00:10:20], and stuff like that. And I was like, "All right, this is pretty good tech. It's pretty scalable." This all seems like a solve problem. There are five or six different companies all doing the same thing. What is not solved? And so, what I did was... When you come to spotting opportunities, one of the mental models and frameworks that I like to do is to do a jobs to be done analysis. Basically, just go through and figure out what is still not what very well solved.

Shawn: I was breaking down the jobs of a monolith, or jobs that we use computers to do. And the thing that I kept running up against was long running jobs and long running, you think it's video processing. And you're like, "Okay, if I don't do video processing, I don't really care about long running or not. But no, actually long running when it comes to serverless, what's the default time out of a Lambda, like 15 seconds, like five seconds? I don't actually know. Anything longer than that you start to need to do all sorts of contortions to string logic together to actually get stuff done.

Shawn: So, I realized that actually in the serverless land, serverless is really good at short-lived tasks, and scaling up really quickly, and spinning things down really quickly. It's not really good at the whole orchestration bit. That's why AWS Step Functions is so loved. But then there are developer experience ways to improve upon that. And there are other scalability, and other API design metrics to improve upon that experience as well. So, yeah, I mean, so I wrote a blog post essentially about what I thought was missing in serverless and Temporal found me through that and hired me for the job. So, I think when you figure... When you just constantly knock at the edge of what you think is still not good enough, you tend to attract people who think the same way. And that's why I think Jeremy also read this in some of my writing.

Jeremy: Yeah, no, I mean, in terms of... First of all, there's a whole... I think we actually had a podcast about writing good blog posts, getting you hired at other places. That's another thing I just can't recommend it enough. If you have a problem with the service, and again, as someone who has built a number of products, please tell me what's wrong with it. I want to know. Tell me where those edges are, but just going back to what you said about those long running processes, too, is that's one of those things where I think there are so many of these use cases that come up for for these different types of long running process. It's not just video processing, like you said, but it's everything from just guaranteeing that something will happen. And one of the biggest things that I always argued for, and I told AWS this a million times is the circuit breaker pattern. Do not let people make external API calls without building in a circuit breaker for them. Because again, writing that yourself is such a pain. So, I don't know. Maybe we could just talk quickly about some of those other use cases that Temporal solves.

Shawn: Sure. Actually, so I'm not exactly familiar with the name of circuit break. What are the components to the circuit breaker?

Jeremy: Oh, I'm sorry. No, the circuit breaker pattern is just that if you try to make an API call and that service is down, rather than you basically keep trying to make the same API call over and over again, because again, we know that usually when services start to run slow, the worst thing you can do is send them more traffic. So, the circuit breaker is a nice way to be a netizen, a good netizen. I don't know. I hate that term, but anyways, to be a good internet citizen, I guess, and to start limiting how much you send to it, but you don't want to be dropping your own call. So either you're queuing them on your side or you're sending a call every every minute or every five seconds or whatever it is just to see if it's back up. And essentially once it starts responding quickly again, then you can start flowing your traffic through. And they eventually did this with API destinations in EventBridge that the handles the throttling as well as quotas. It took them a while to get to that point. But again, you have to use that one specific service, but I mean, just in terms of other use cases that are there. Like Temporal, I know you talk about microservice infrastructure or microservices orchestration, things like that.

Shawn: Yeah. So by the way, so I call that... I think about that in terms of exponential back off. So, the simplest algorithm for spreading out all your API calls into some kind of smooth load instead of a spikey load, whenever you try to retry everything that failed is you have exponential back offs for everything, and they tend to sort themselves out over time, which is a really nice property. I'm sure there's some math behind it, which is really fascinating.

Jeremy: The AWS jitter algorithm?

Shawn: Yes, jitter as well. Exactly. Yeah. So yeah, so we're really good for that. Essentially, what I call that use case for us is reliability on rails. In other words, imagine every single team making any API call, any external service call, any service call to different teams service. I don't care. Anytime you're crossing system boundaries, you need to set that up. That is a basic production requirement that you need to handle retries and timeouts. And don't forget timeouts as well because that's additional scheduling. So, in other words, why don't we, instead of every single team managing that reliability infrastructure, we centralize it as a service and then provide that to the rest of the company. Which is often something I'm seeing in the big companies that we engage with that there's a central platform team that is responsible for the temporal offering internally. And then all the other engineering teams are just customers of them. That's a really nice pattern because then they just get to focus on their service and then the reliability guarantees are centrally orchestrated.

Shawn: Okay. So, that's the general category of microservice orchestration, but you can also use us for a couple other things, which I'd like to highlight. So, distributed transactions is one of them. Essentially, just the blocking and tackling of having a lock here before you execute the other side of this transaction and then unrolling both of them if the other side of the transaction fails. So, this kind of thing is really important for something like a Coinbase or a Box, both of which are users of us.

Shawn: If you think about it, the easy one for me is actually the Coinbase one because cryptocurrency transactions versus fiat transactions. Fiat transactions actually execute right away compared to cryptocurrency, which actually takes some amount of confirmation time. And so, imagine you have to hold them before the fiat currency goes through. So, doing that across systems is the generalized problem with distributed transactions. And similarly for Box, even though it's just file transfer, doing it at Box's scale means that you're actually spanning a number of different systems and you need to be transactional because imagine if you had deleted a file here and failed to reproduce it somewhere else, or vice versa, you had two copies of the file somewhere. So, just generically making that go away because you build it into the framework is a really nice property to have.

Shawn: The third one is infrastructure provisioning. And so, this is something that we actually just had to meet up about this. HashiCorp uses us or uses the open source version of us because they're not a customer to build HashiCorp cloud platform. It just powers all the spin up and spin down of their clusters, which is pretty cool. All of these are long running tasks, if you view them in the right light. And so, the argument is that you need to use our use workflows as code to interface with those SDKs, whatever SDKs they are. And for HashiCorp, they actually use Terraform as their SDK for interfacing with whatever cloud they're on, and that seems to work pretty well for them. And my favorite quote from Mitchell Hashimoto, who's actually an advisor to us is, he says, "If Temporal did not exist, he would have had to build it for HGP." To me, that's the best endorsement for a provisioning use case that we can get.

Shawn: We're also pretty good for monitoring and polling. So I think that's another blocking and tackling bread and butter thing. I just got off a call with a customer who is using us for social media monitoring, essentially getting pings from when some things, when some mentions or something spikes in mentions, he sends notifications to his customers and that's a service that he sells. And we're really good at that because essentially what you need to do is you need to do distributed CRON, distributed polling. You have a bunch of items. He has a million things. He has a million hashtags or something under his watch list. It's an Instagram polling service. And then he needs to farm it out to a bunch of workers. The workers need to complete, or if they fail, they need to retry, and all that good stuff. And then he needs to collect all that data, run reports on it, and then send it out, again, to his hundreds of thousands of customers. It's a very generically interesting distributed systems problem, which I really enjoy.

Shawn: The final one I'll highlight is actually DSL workflow. So, I just said that we are opinionated against domain specific languages, but we're actually really good to be a base layer to be interpreters for a domain specific. So, imagine if you are ConvertKit or Drip or some kind of email based platform, and you want to offer a workflow system and you don't want to build the actual underlying workflow engine. You can use us under the hood, which Twilio does and offer some kind of higher level abstraction to your non-technical users.

Rebecca: So, I love the idea of opinionated, non-opinionation, and a few different times that's already come up, right? I'm like, "Hey, how does "Temporal think about serverless?" And you're actually, "We're not opinionated about that. It's about the use cases that we want to apply stuff to," but then there are certain things you're definitely opinionated about, right? Something that you had said around infrastructure provisioning is write your own control plane in languages you know best. That's a very specific statement around write it in the way that your teams are going to be successful in building your product and getting it out to customers.

Rebecca: So, I'm wondering if you have... Well, I'm going to caveat this with, I know Jeremy's really excited to talk about self provision run times, so I know that's going to be the next question, but once we get there, I'm wondering if you can highlight a little bit around how you choose and evaluate what you should be opinionated about and what you shouldn't. Because I think there's probably always overlap between people are like, "Well, we should definitely have an opinion about this." And other folks are like, "No." And then how do you evaluate or make those trade offs around where you spend your time to have an opinion and where you're like, "It's fine. We're not going to touch this. This is not necessary to our core business."

Shawn: Wow. That is a very high level... I feel like it's above my pay grade, this question. It's very much like [crosstalk 00:20:57] you have to decide what matters to you. In a perfect world, we would just have everything, and no trade offs would ever exist. But we live in the real world and we have to make trade offs, and essentially we need to figure out, what priorities do we place above others? And so for us, we have chosen as a problem domain long running, mission critical workloads. Mission critical in a sense that they need to be either run to completion or failed. No limbo, no data loss. That's it, end of story, there's no other possible state.

Shawn: And so, we need to design the system to handle that, and we also need to offer the API primitives that help people code their application in the right way. So, I've said a lot about orchestration, but I will mention two other... I've said a lot about orchestration, and I also said a lot about the workflows code thing. The other opinion, the third opinion, which I find that I don't mention that much. I don't mention enough, but I'm going to mention here is that we use event sourcing and this is a thing under the hood that people find their way. Every now and then, a blog post blows up on Hacker News about like, "Oh yeah. Did you know that this technique exists?" It's actually just people understand it, and they really like it, but it's just hard to implement.

Shawn: And so, that's part of our job is that we slice off a part of the event sourcing problem for you and we implement it under the hood, so you never have to touch it. And so, what that means as well is that you can log... It also happens to neatly solve the distributed tracing problem as well because we have immutable log of all your events and we never drop any work because if it's not in our logs, it wasn't done. So, we just retry it again. So, it's a very logical framework to proceed on, but it does have other trade offs. So, for example, in terms of cap there, we choose strong consistency over high availability, or what's the other one? In other words, latency. We'll, trade off latency for consistency.

Shawn: That's the kind of thing that we have to be honest about because there's so many other priorities out there that we have to respond to. But it's not to say that... Does that mean we're slow? We're we're not because we have much more scalable infrastructure where that bottleneck doesn't happen compared to other workflow engines that have a central event loop. So, for one of the other popular workflow engines, which I won't mention by name, they actually have a core loop that only pulls for new events every 30 seconds. In other words, you cannot respond to anything faster than 30 seconds. And that's the kind of thing that you'll only find out deep, deep, deep into the implementation because they don't tell you about it because they weren't built for that. By the way, this framework is meant for data pipelines, which were processed once a day. It's a reasonable assumption for the designers of that framework to just base everything on a central heartbeat of 30 seconds. But for us, everything needs to be fired off pretty much plus minus to a second. That's the design concerns that you have to really get into as you make your technology choices.

Rebecca: I think you just earned a new pay grade because that was a lovely answer. So thank you for answer.

Jeremy: It was a good answer.

Shawn: Hold up. I'll try one more thing on you, which I want to see because I'm really enamored by this and I don't know how much of a connection it makes. So, I think that a lot of the services that we use are basically custom databases, like fancy databases. Like a search engine is a fancy database. An analytics engine for storing your click tracking or whatever is also a fancy database. And so, my high level abstract opinion of Temporal is that it's just a fancy database for your long running work. Just like search engines, just like analytics engines, you don't want to write your own workflow engines yourself, but for some reason, most backend developers have been writing it for the past however many decades this has been a thing. And so, that's what I'm moving myself towards, which is yes, it's very hard to explain a single use case for this because it's so general purpose just like you would have a search engine and an analytics engine, and then you do whatever you want with it. That's where we are with workflow engines.

Rebecca: I'm just honestly imagining a database in a bow tie.

Shawn: Yeah, they're basically...

Jeremy: All right. So, let's move on to self provisioning run times because I've been trying-

Rebecca: Jeremy has been waiting for this. He's like, "Ugh."

Jeremy: ... to hold back in order to get here, but I'm like, now's the time. So over a year ago now, Doug Moscrop and I, we started working at Serverless Inc. Started working on this idea behind the scenes thinking what if you didn't need to write infrastructure code? Why isn't code smart enough or why can't a system be smart enough to take code and say, "Hey, here's what I need to run it." And this idea of the cloud computer or whatever. So, we're working on this thing for nine months, whatever. And then all this sudden this article comes up, self provisioning run time. And I read the first line. If the platonic ideal of developer experience is a world where you, "Just write business logic," the logical end game is a language plus infrastructure combination that figures out everything else. I read this. I put my arms out like this, lights light up, music starts playing, dogs fly out from behind me. I'm like, "Yes, yes that. Why do more people not get that?" Anyways, so I will not do it justice. So, can you just quickly explain what you mean by self-provisioning run times?

Shawn: I feel like people should just look at Serverless Cloud and they'll kind of get it. Essentially, what if your system, whatever system you're running in understood your program enough that it was able to provision its own resources to run that program. And so, right now out, we're very used to provisioning resources and then running the program. And then whenever... The whole concept of DevOps is essentially like, "Okay, program ran out of resources. So we needed to spin up more resources or fix the resources," or whatever. But why don't you just read your program that you're freaking running and just figure it out? I don't know, you tell me. And so, obviously that's extracting over a lot of complexity, but that's the end goal. That's where this all ends up.

Shawn: And so, ultimately, this blog post came from a number of pain points. Again, I was still working in AWS at the time and the new hotness is AWS CDK, and Pulumi as well. Infrastructure is code. Literally as code, not as [crosstalk 00:27:48]. Exactly. But actually as code that you can program and reuse. So, all that good stuff about software engineering practices applies to CDK and Pulumi as well. But I was like, "Okay, this is great, but also we're just building up a whole bunch of stuff just to compile down to cloud formation," or whatever. And then on the other side, we're still reading in a bunch of config values and then carrying on with the actual program logic. Why don't we... Who's like merging the two? This is obviously the end game, right?

Shawn: So, I have this graphic here of these abstract little blocks because I've been looking at language theory design. I think the formal term for this is PLT, programming language theory. And I have some friends in that field and I always listen to them, but they keep talking about types. They're like, "Oh yeah, if we had a better type system the world would be much better place." And I'm like, "Sure." Stronger type system, monads and all that good stuff. But one of the main advances in programming language that has still not been beaten is just programing languages that took care of memory allocation for you. Like had automatic garbage collection or they just got rid of the concept of having to manually manage memory registers.

Shawn: And that's a huge step advance. But that only happened because we had just assumed that the run time would take care of it. And yes, it's not perfect. Yes, you have performance trade offs, and yes, sometimes you have to opt out of it and go down to a lower level language, but 90% of developers don't need that. And so, similarly, 90% of people who work with cloud don't need to know the underlying implementation details of what kind of storage and what bucket to put where. Just figure it out for me, and let me just work on the app. So, that's where I'm at. I want programming languages to advance to the point where they can just assume the run time to do it. And I want infrastructure provisioning to advance to the point where they can read programming languages and figure out what they're supposed to provide. Does that ring a bell? No?

Jeremy: No, no. I mean, what I'm thinking here is I look at this and I say, "This is not... This is almost... The idea of self provisioning run time is we only need to provision infrastructure now because we've complicated the infrastructure in a good way." I mean, splitting things out to separate services for queues and for databases and for workflow engines and things like that I think makes a lot of sense, but go back. I mean, think about Ruby on rails, right? I mean, you could distribute it mostly right on the same server, but even just changing your data structure by changing your code was a really interesting approach there.

Jeremy: But now we've gotten to the point where developers, and this is, I guess, maybe the evolution of where we are with serverless and maybe not so much if you're still running VMs or even containers to some degree, but you are in there and you're hopping back and forth between that. Tell me what primitives or here's the code that I want to run, and here are the primitives that I need to run that, but that mixture now it's like the DevOps. You call this the DevOps divide, I think. What's the point now? Isn't everyone DevOps, right? You're developing and you're doing operations at the same time. So, yeah, there's a whole management and monitoring and observability thing maybe later on down the road that might have some separate responsibilities, but when you're writing code, that's what you're doing now. You're saying, "I need to run this on a Lambda function or in a Fargate container or on Kubernetes or whatever I need to do." And you need to know that's where your code is going anyways. So, why even have this split between writing code and writing infrastructure?

Shawn: I couldn't say it better myself. So, yep, co-signed

Rebecca: Jeremy was like, "I was actually just reading from your blog post. You did say it."

Jeremy: Well, I mean, so let's talk a little bit then about some of the... Because Serverless Cloud, we can talk about that briefly, but there are other companies that are doing this, too, and they're going down this path. And the first one I want to talk about is Dark Lang because I think Dark Lang's really interesting, and you actually put in the post about is it a run time or is it a language? Can it be a run time and not a language? One thing that is interesting about Dark Lang is it is sort of its own proprietary language, which probably feels very familiar to... Well, not familiar, but similar to Svelte where Svelte has its own proprietary way of writing things. You know what I mean? So, it's a little bit different than just writing JavaScript or go or whatever it is. So, talk a little bit more about that though, the difference between writing your own language and then just having a run time that can interpret regular code with maybe SDKs.

Shawn: I really like your comparison with Svelte because that's something that I haven't really thought about before. So, Svelte hacks their way into acceptance by being a super set of HTML. Whereas, if you had to invent a completely new language, then everything is basically up for grabs in terms... Or you cannot assume anything. You'd have to learn everything from scratch. Every single language construct, you can't assume anything. Whereas, being a super set of an existing known language is a really nice way to extend upon something that people are at least productive and confident with. So, that's a really nice thing.

Shawn: Well, Dark Lang, it's no secret that Dark Lang has struggled, even though it's called Paul Biggar behind it. But the experience that I had when coding with it was just amazing, and it was exactly that self-provisioning runtime thing when if I needed a database, I spun it up right next to my code, and I saw that data come in and I could click around with it, and it flowed through my data and I could see the values that came out of that. I've never seen that anywhere else, and it just really blew my mind. So, I really like that concept, and I think that for sure was one of the major inspirations for what I think the self-provisioning run time could be. But I mean, there's definitely others. I think obviously Serverless Cloud, I had no idea that you guys were working on it, but Doug's phrase for it I think is really compelling, which is essentially infrastructure from code. Inferring infrastructure from code instead of writing code as infrastructure and then building it out and then having code consume that infrastructure, whatever. It's just all sorts of loopy.

Shawn: But I've also had that aha moment with Begin, which is Brian LeRoux architect framework that's hosted because he includes DynamoDB as part of his serverless functions, and is there if you want it, and you just require that data, you put stuff in it, get stuff out of it. I don't really have to think about anything else apart from just writing import Begin/data. That's really amazing. And so, there's a bunch of other small little dabbles, and stabs at it. But I think Serverless Cloud has been the most advanced implementation that I've seen.

Jeremy: Well, I appreciate that. We've been working pretty hard, but some of those other ones you mentioned, though, they are... I think what you said. So, again, we talk about is infrastructure from code, right? You say self provisioning run time. If you look at say Compiler.run, they have this concept of what they call infra free, right? Let me say a higher level abstraction enabling feature developers to write infrastructure free scalable code with virtually no learning curve. So, similar to infrastructure. I mean, they were all the same concept here. And then Lambdragon is just about... Actually, I love Lambdragon because it talks about reducing the size of your code base, which again is a huge thing. Code is liability. So, every line of infrastructure as code is also a liability.

Jeremy: You mentioned Begin. Love Begin, Brian and I, we've known each other for quite some time now. I love what begin does or what Architect does, because again, it simplifies the abstraction to the primitives, but that's one of the big things though that we've been trying to move away from is mapping directly to primitives because that's what you do with Architect. That's what you do with the serverless framework. That's what you do with CDK or any of these other things. And even though CDK and Pulumi have a concept of constructs, and I forget what they're called in Pulumi, but essentially a way that you can package a bunch of things together. You're still at the end of the day relying on provisioning specific infrastructure, and tuning that infrastructure based off of DSLs, essentially, at the end of the day because that's what they compile down to.

Shawn: Yeah. So, I think this is a kind of market maturity thing. Who are you building for and who can you get or convince enough to bet on you? So you have to walk a fine line between here's the end game that I really want to get to and are they ready for it right now? And so, of people who are familiar with the existing paradigms of today want that degree of control, right? They're not comfortable with you deciding for them. Ultimately, that's where it's going to have to get to for it to be truly ubiquitous. But right now the people that you have to convince to make sure that this is a thing to sell it into large organizations, they want that level of control because then they can just port over existing apps to that and just run it, and start seeing the benefits from that. So, I think it's a fine line. I don't blame them for the pragmatism that they're displaying.

Shawn: I definitely see, I think it's similar to what we're talking about earlier with intents. Essentially, you have an intent on a certain type of resource and you don't actually care about what the actual instance of that is and what cloud it's on even, but you optimize that all for me, including the costs. That's essentially what you're going for. I think we have to step it up, and slowly step it up over the next 10, 20 years of how much control we give to people and how much we do for them. But ultimately, if we're talking about best developer experience, you advance developer experience by increasing the number of things that people can do without thinking about it, which is the Alfred North Whitehead quote, which I really love. Anything to do with technology, that is the source of magic that you don't even think about that anymore because you just take it for granted. It's so boring. Why spend a whole one hour podcast talking about it? It just happens. That's where we want to get to, but the journey to there is a lot of convincing people who are stuck to the old way.

Jeremy: Yeah. So, I want to... I'm going to let Rebecca finally ask a question because [crosstalk 00:38:17]-

Rebecca: You do your thing, Jeremy.

Jeremy: I have one more.

Rebecca: You have a ride today.

Jeremy: One more on this. So, you are a former Netlify person. So, Netlify recently announced that they acqui-hired Simon Knott, and he developed Quirrel, which is sort of a workflow engine. It's a queuing tools-

Shawn: Nah.

Jeremy: Okay. All right. I'm stretching there. But anyways, but one of the things that they brought over because I think was all inspired by the work that Simon had done was scheduled functions, right?

Shawn: Yes.

Jeremy: So now the way that scheduled functions work, and again, I saw it and I'm like, "Import, schedule." I'm like, "Oh, that's exactly what we do at Serverless Cloud." But anyway, but I want that because this is the direction I want it to move, and this is my point about control. You said you have to give people a certain level of control. So if you look at the syntax for Serverless Cloud, you look at the syntax for Netlify for scheduled functions, it's basically schedule. or schedule and then parentheses for Netlify.

Jeremy: But essentially you passed in an argument that says what you want that schedule to be. We do schedule. every 10 minutes or schedule.cron, and you can put a cron tab in there and it's similar to Netlify. But essentially what you're doing in your code is you are expressing intent. I want this to run every hour or whatever it is. You can do that with other things. So, even APIs, we have a concept in Serverless Cloud where you say API.get, and you do an endpoint, whatever it is. And then you pass in a handler function. That is very easy for our system to interpret that you want that to run when somebody calls/whatever on an API gateway. But what else we do is we add a little bit of config that you can pass in where you can say, "I want the timeout to be 10 seconds. I want the timeout to be..." Well, it can't be an hour, but now I want the timeout to be 15 minutes for this particular function.

Jeremy: So, you can pass in little bits of configuration that we don't let you pass in the memory and some of these other things, but for the most part, when you write in the Serverless framework, or in Architect, or in CDK, or in cloud formation, or whatever, or Terraform, and you say, "I want a lambda function. I want it to point to this snippet of code or this zip file. I want it to run for 30 seconds. I want it to react to an HDP event that does this, whatever. That's duplicating what you've already written in the code. So, I'm sure here are going to be limitations that we're going to hit up against. You can't do this. You can't express that in code, maybe. But just that to me seems like isn't that the better place to put that type of developer intent? I want this to run for 30 seconds. I want this to... Even if it was a memory configuration or whatever, it just seems to me that's the better place to put it, and it still gives a lot of control.

Shawn: Well, you're not going to find a lot of disagreement here.

Jeremy: That's all I want, Shawn. I just want you to agree with me.

Shawn: You want validation.

Jeremy: That's right.

Shawn: I'm just going to guess I'm going to agree with you all the time. No, I think that's good. I'll say this. So, I'm going to speak out. There are probably listeners who are basically screaming at their headphones right now because conciseness is not the end all and be all right. Sometimes it's okay to just duplicate things for some type safety, for some speed, for whatever. It doesn't come at a cost because if everything's implicit, then yes, you're going to lose some things along the way. So, we should be measured about this. Conciseness doesn't win all debates, essentially is what I'm saying. So, yeah. I mean, there's that.

Shawn: The other thing I'll also point out. So, I want to push back on one thing, which is equating schedule functions to workflows, which is the whole thing of... Obviously, I'm professionally aligned to do this, which is that you can build the beginning of a workflow engine with a scheduler because every workflow engine needs a central event loop to hard heartbeat and to do whatever, but using schedule functions to do the heartbeat essentially is not really scalable because you probably only have one instance of that scheduler in there. And so, imagine, having enough calls to execute that you run over into the second frame of whatever you're supposed to execute.

Shawn: And then also, you're limited to the narrowest window that you can possibly execute on, which is for the framework that I mentioned was 30 seconds. And then there's a lot of things else that you need to build on top of that. So, that's the thing I worry about, which is people go to the schedule functions and they're like, "Okay, job done. I have my entire stack that I can do anything with." And then they realize, no, the job running process or the task of job running is actually longer than that. But I have a fun story. I worked at Netlify. They are my second biggest shareholding in my net worth. So, I care a lot about them succeeding. So, right after they announced it, I think two days ago or whatever they announced their launch. I actually opened the first issue and said, "Here are the other 10 product features that you should build because I don't want you to think you're done building here." So, I included jitter in there. I included manual triggering and pausing.

Shawn: So, did you know? For example, I've been doing scheduled functions on Netlify for a long time because I use GitHub actions to schedule the functions. Anyway, there's all sources of cron everywhere, but crons fail. I've been tracking... I set up my project to track GitHub actions. I ping my Netlify function every hour. It's failed 10 times over the past six months. So, do you have a process for recovering from failure? Do you have a process to back fill or to manually trigger when you need some job done. There's all these other nuances that don't get considered when you compress everything into a little config that's just the cron tab syntax. And then you link people to the nice little reference of star, star, star, and then those little lines that everyone includes. You're not done. Time is such a fantastic complex beast. Oh, yeah. Let's talk about holidays and time zones. But time is such a fantastic and complex beast that I'm only appreciating after coming to someplace in Temporal. And so, that's hence the name.

Rebecca: Crons fail is an amazing bumper sticker. I like that idea. Just crons fail, period. So, what are going to do about it?

Shawn: I don't actually understand how crons... Basically, the system has to go down, but I don't know. It's such a simple thing, and you think would never fail because how could it? So, you don't plan for it. And then when it does, you're like, "Oh, who could have saw that coming?"

Rebecca: That's why I think it's so humbling because it is what "the simplest thing" you're like, "Oh, if anything that's taken care of," and it's like, jokes on you. Nope.

Jeremy: [crosstalk 00:44:56]. I was going to say it's probably the source of a lot of ghosts in the machine. It's like that thing, why didn't it work? But then it works every other time. Those random things and you don't know because there's probably no logging that you cron failed. It's just like, "Oh, it didn't call that function for some reason."

Shawn: Yeah, totally. So, I mean, I'm working on this blog post. I haven't put it out yet, but it's actually this, what is the most over-engineered distributed cron on steroids? What is the maximum amount of requirements you can stick on this thing because that's essentially what we're building internally. So, I was just wanted to just go through the hundreds of different little product decisions that you can really dive into when you start getting serious about scheduling in cron.

Rebecca: Yeah. I want to pull on that thread a little bit, or almost the opposite of we were talking about the platonic idea of developer experience. And that would be a language infrastructure combination that figures out everything else for you. And so, it goes back to what might be the mind set of serverless, which is like abstract away the things that take away from you being able to focus on writing your code and building your app, full stop. And you wrote this really great post around we build mucks that you don't have to. Let's say, and why AWS is too hard for developers, essentially.

Rebecca: And so, we have been, and maybe it's in our DNA, for Serverless Chats is that we've been asking so many of our guests around, okay, what happens when you end up building so much to abstract away the muck that over time the solutions themselves become points of failure. So, when does building abstraction only add complexity and weight to something? I want to know if you have a... I mean, I'm certain that you have an opinion on this, but this idea of at what point does something become so abstracted that it actually is not helpful anymore? Are there moments where you've had to pull back what you were going to build because you're like, "Actually, this simplicity is more of a burden and it's going to up end up filling."

Shawn: Yeah. I think we do a lot of that and it's never clear, and what's over abstraction to you may be just right for me. That's the beauty of software is that you can always solve a problem by going up another layer of abstraction, but then also you're adding an extra problem because abstraction has a cost. And so, I think people are constantly trading that off against each other. I will say that I think... So, my favorite story about this actually comes from, I think it's Benedict Evans. He was an OG Microsofter in the '90s, and he talks a little bit about the word processors back in the day like Microsoft Word or whatever was before it.

Shawn: Back in the day, you had to buy plugins to get word counts, to get page numbers, to get horizontal layouts. And all of these were individual plugins like $50 for some really routine thing that you take for granted today. And that's just because word processors started out simple. And then as we grew and took them and adopted them and took them for granted then we understood that the job of a word processor expanded into having more and more of these features. And now you just take that entire suite of functionality and that is a word processor to you. Anything less is just not even worth considering. And that's a function of the products or the product category being mature.

Shawn: And so, I think it starts out simple, then it gets complex with more abstraction, and then that gets absorbed into that first underlying layer. So it's about, I think, the number of layers that you want to manage and you want to let that grow initially because you don't know. As the core platform designer, you don't know what the right abstraction should be. But as you figure out what people use you for, then that you absorb parts of that into your program. And that is going to cause friction because people are building businesses on top of you and relying on you for that. And then you're going to take their livelihood away. Sometimes you're going to be hauled up in front of the European Commission for bumbling your browser and your operating system, and you have to deal with things like that too. But now what operating system does not come with a browser? So, there's all these questions that are just not settled. But I think if you think about it in terms of the long arc of history, it makes sense. It makes sense that things get complex and then they simplify.

Jeremy: So, one of the things again about complexity and finding the right level of abstraction, all these other things I think is a relevant topic to where we are or I guess a relevant conversation around containers and Kubernetes in the cloud right now. So, I mean, I would love to get to a world of self provisioning run times. It's absolutely where I think we need to go. But right now there's this, I don't know, there's a gap between even just going full serverless or service full versus baking in a lot to this orchestration with things like Kubernetes.

Jeremy: I look at Kubernetes and containers, honestly, as a stop gap. I feel it's like the hybrid cars where it's like, we're all going to get to electric cars someday, but we got to do this hybrid of electric and gas for a while the infrastructure gets there. But eventually the infrastructure's going to get to a place where you don't need Kubernetes and you probably don't need containers. There's going to be a better our way to do it. Run it close to the metal. I've said this a million times. So, listeners are probably bored by it, but I'm curious what... Again, because you're clearly a visionary thinker thinking about the self provisioning run times.

Jeremy: So, I'm just curious what your thoughts are on the trend or hopefully a trend that is moving more towards making serverless or I guess cloud native being just serverless. I mean, the cloud just becoming serverless essentially because that to me it makes more sense, but I don't know. I'm interested on your take on this is what you think about that movement. Is that where the cloud's going? Are we going to get to a point where you're not going to need the Kubernetes and the containers or we're going to be living with this for a very long time?

Shawn: So, the first reaction I want to say is even the people who work on Kubernetes like Kelsey Hightower is famous for saying he thinks Kubernetes will go away in five years. He said that for three years straight, but I think so. I think everyone's expecting that day that Kubernetes becomes an implementation detail rather than something that people wrangle on a daily basis. But the other thing I want to ask a question to answer your question is, do you think that serverless containers are serverless? So, the Fargates of the world, the Google Cloud Run.

Jeremy: You put me on the spot. So, I don't know. I mean, the way that I feel about those. I don't mind containers as a packaging format, but this idea of having to... I mean, if Fargate got to a point where it would spin up automatically based off of events, like more events coming to and so forth. So, if it had the orchestrator built into it. Like right now that is sort of, you got to use, what is it? ECS or EKS in order to manage the orchestration of the scaling of those pots. If that was just built into Fargate where it's just like, well, if I have Fargate running and it will just increase whatever. Then maybe I would feel more that that was more serverless. But I also feel like just having to think about the run time, having to think about the operating system, having to upgrade those yourself, those are just all things to me that feel, I don't know, they don't make me feel great.

Shawn: Okay. Got it. Yeah, I think they're getting... That's obviously the goal and as long as you can docerize stuff, it's mostly declarative as Docker can be considered declarative. It's not, but whatever. So, in that sense, if that is serverless, then yes, the world is going serverless. But I think that part of the world is here to stay. I don't have much more insight than that. I definitely think that a bunch of... I call these cloud distributions, cloud distros. Some people call them layer two clouds. Essentially like the render.coms of the world, and Begin, and cover. There's a bunch of others, startups that are layers on top of the big clouds that offer some kind of advanced developer experience for a specific audience.

Shawn: I like that they are all trying to innovate on top of the... If they're server full or if they're... I don't even know what the term for this is, but they're trying to basically be the new Kubernetes or they're trying to abstract away Kubernetes for you so that you never have to worry about that. And obviously, that's a very nice positive. We have to see how it shakes out. I think it takes another five years to envision this. Meanwhile, the serverless people are just happily going along, and takes writing functions all the way, and the world upgrades for them and they never notice, and that's beautiful. That's obviously if you listen at all to Ben Kehoe, he's an absolutist about this kind of thing, and I try to push back on him, but every single time he knows the right thing to say. So, no, I think so. I think people want to give away that power.

Shawn: I will also say, I think one of the interesting trends that I do see is a counter trend towards repatriating from the cloud. I think you may have discussed this a little bit and people have mentioned it. I think because there was an incendiary post from Martin Casado at Andreessen Horowitz. But essentially, once you get big enough, yeah, you might want to go back to the server full paradigm because you need that control. You need that lower cost because you are paying an overhead for the amount of serverless that you're doing, and yes, people should consider TCO when doing these calculations, but at some size it does make sense.

Jeremy: Right, totally.

Shawn: And then there are companies like Oxide Computer Company that are basically providing the hyperscale quality machines for you to build your own data centers.=, and it's like what are you doing? You're going against the trend, but that's what the people who are really big actually need. So, I don't think serverless can be a one size fit all. It can be one size fits most, and that's what it should do, and that's fine.

Jeremy: Yeah. I think you get to the point, too, and speaking of Kelsey Hightower, I actually saw him speak a couple years ago. One of the things he said about serverless was is if you get to that point where you get somebody else managing your server, that's fine. But if you really need more control, who cares? If you're making money off of the service, whatever, you'll pay people to watch your server. You know what I mean? If you want that control, you can bring that in-house or whatever you're doing, and I always thought that that made a lot of sense, too. But yeah, no, totally agree. I think it's this matter of especially when you're getting started, it's how much complexity do you want to take on, and how big can you get before that complexity needs to... Where you need to take on that complexity. But I still think Kubernetes will go away.

Shawn: Hey, I mean, it's a big world. Every cloud is expanding 40% year on year, basically.

Jeremy: It's crazy.

Shawn: Which is nuts, and it's such a big world that you're going to have a lot of diversity for a very, very long time. People are still running Coball. In the same way, people will still be running Kubernetes 50 years from now. So, define go away.

Rebecca: Such nostalgia, there'll be nostalgia around Kubernetes, and 50 years from now they're like, "Oh, remember when." So, I want to fill a promise that we made at the beginning of this podcast, which we would talk a little bit about being the Head of Developer experience and what your role or title means. I want to couch in this idea. There's a number of words. I would say two that are most popular in terms of describing developer focused roles. So, there's developer relations, developer advocacy or developer advocates. I think that those are often visibly focused on outputs. Certainly, they do a lot of work internally and educate, bringing what's happening in the communities back and talking with internal and product teams. But a lot of those are focused on outputs like education to help developers succeed with what are increasingly complex products.

Rebecca: But I think developer experience leans a bit more toward inputs. So, like asking that question around what are we building and how do developers use and move through it? And let's pre-think. Not so that we don't have to build education afterwards to help them with it, but it's rather like how can we build this so that we don't retroactively have to design education just to help people start with it or something like that. And so, I'm wondering if you can talk a little bit about what it means to be ahead of developer experience both at Temporal and philosophically, if you would see that as inputs versus outputs based, and if we could all shift toward more developer experience input thinking, I also imagine the world might be a little better place for developers.

Shawn: I like that. I really like that framing of input output. I don't think anyone has put it to me that way before, but I think I agree with it. There's a limit to how far you can go because... And I'll put it this way. So, my role is a product role. I am on the product team. I report to the head of product at Temporal, and we do a bunch of things. So, I work with the docs team. I work with the [deveral 00:58:00] team, and I was the... Because we didn't have anyone else, I was the lead product manager for our types of SDK. So, I was in the weeds designing the APIs that I was about to go write the docs for, and then also do the deveral for. So, that's the multi hat thing that you have to do at a smaller company. But definitely, as you mature, each of these become specialized roles that should have people who know what they're doing take care of each of them.

Shawn: I would say that also as well. I highly empathize with what you just said because I was a developer advocate at AWS and at Netlify. And when I found issues that I... A lot of times you're at the tail end of the value chain when facing customer problems. You speak the most with customers or potential users, but then you have the least power to do anything about it. So, often, I would write a blog post documenting, here's how you solve the problem that you should solve. But really the best docs are the docs that you don't have to read because the product is so intuitive. So, really what you should do if you care about developer experience is have some somebody end to end going, "All right, we are hearing this a lot from the customers. Let's actually prioritize this." But it takes a company that buys into that end-to-end philosophy and typically designing or chart that way helps to guarantee that.

Shawn: Whereas, if you have products that is completely separate and never talks to a developer experience or developer relations, that's where you get... Where you ship your org chart in terms of developer experience that you have there. So, I do strongly believe that. I think about it in terms of concentric circles. At the core you get the product design, API design, and all that. At the early stages of the company, which is where I am, you have the most impact there because there's not that many other people, and you can have direct input as you go along. In larger companies, this is a specialized function with very long running commitments, and naturally the DevX people will be focused on other parts of the adoption curve.

Shawn: So, then you grow out from there and you think about docs, right? You can think about first party tooling like CLI tooling or UI tooling, anything that helps people that's not core, but it's still part of the engineering that contributes towards the developer experience. Then you think about the first party content that you start producing that is not in the core docs, but you still get people to integrate with you or to give talks about how they use you because that is a really a different way of how people figure out how to use or get the most out of you. Then I often think about from there going to community, which is something I also am focusing on and hiring for hint, hint. That's where we, by the way, also engage a lot with people like Common Room because-

Rebecca: Thank you.

Shawn: You guys are the experts at managing community. Shutout, right? I have to, and we've been blown away by Common Room for anyone who is interested in building developer community go hit Rebecca up. And that transition is from deveral to community is this transition from one to many, to many to many, and having that ongoing engagement there. And then finally for me, the last thing I got to think about is the third party content, the user generated content that happens outside of community. Instead of us creating that content or having people talk to each other on a one off basis, having ongoing engage content. And some of that content can be job listings, which I really like as a user content, which we have, by the way. On Temporal.io/careers, the first half is our jobs. And the second half is our users' jobs, and we want people to get hired because then they'll be super loyal to you, which is a fantastic growth hack. [crosstalk 01:01:37]. I mean, that's the entire journey in terms of developer experience, radiating out from the product, and hopefully the communication channels meet up and down that line of thinking.

Jeremy: So, that is absolutely amazing, and honestly, I think we need to do another episode with you just talking about developer experience and that process. Unfortunately, we are way over time.

Shawn: Yes.

Jeremy: But listen, Shawn, this awesome. I thoroughly enjoyed this. I think the listeners hopefully learned a bunch of stuff and weren't annoyed by me being a little giddy talking about self provisioning run times.

Rebecca: Never.

Jeremy: But anyways, if users want to... Users, geez, if listeners want to get a hold of you or find out more about what you're working on, Temporal, things like that, what are the best ways to find you online?

Shawn: Yeah, definitely hit me up @swyx on Twitter or swyx.io on my blog. That's about it.

Jeremy: Awesome, and we've got a couple of other places, too, GitHub, YouTube, LinkedIn. We'll put those all in the show notes. Thanks again, Shawn. It was awesome.

Rebecca: Thanks so much, Shawn.

Shawn: Thanks for having me.