Episode #72: Serverless Privacy & Compliance with Mark Nunnikhoven (PART 2)
October 26, 2020 • 58 minutes
In this two part episode, Jeremy chats with Mark Nunnikhoven about why your online privacy is so important, what we need to think about in terms of compliance, how serverless helps us create more secure applications, and so much more.
Watch this episode on YouTube:
About Mark Nunnikhoven
Mark Nunnikhoven explores the impact of technology on individuals, organizations, and communities through the lens of privacy and security. Asking the question, "How can we better protect our information?" Mark studies the world of cybercrime to better understand the risks and threats to our digital world. As the Vice President of Cloud Research at Trend Micro, a long time Amazon Web Services Advanced Technology Partner and provider of security tools for the AWS Cloud, Mark uses that knowledge to help organizations around the world modernize their security practices by taking advantage of the power of the AWS Cloud. With a strong focus on automation, he helps bridge the gap between DevOps and traditional security through his writing, speaking, teaching, and by engaging with the AWS community.
Jeremy: Yeah. So you mentioned two separate things. You mentioned compliance and you mentioned sort of legality or the legal aspect of things. So let's start with compliance for a second. So you mentioned PCI, but there are other compliances there's SOC 2 and ISO 9001 and 27001 and things like that. All things that I only know briefly, but they're not really legal standards. Right? They're more of this idea of certifications. And some of them aren't even really certifications. They're more just like saying here, we're saying we follow all these rules. So there's a whole bunch of them. And again, I think what, ISO 27018 is about personal data protection and some of these things, and their rules that they follow. So I think these are really good standards to have and to be in place. So what do we get... Because you said, you have to make sure that your underlying infrastructure, has the compliance that's required. So what types of compliance are we getting with the services from AWS and Google and Azure and that sort of stuff?
Mark: Yeah. So there's two ways to look at compliance... Well, there's three ways. Compliance you can look at as an easy way to go to sleep if you're having troubles, just read any one of those documents in you're out like a light. And then the other two ways to look at it are, a way of verifying the shared responsibility model, and then a way of doing business in certain areas. So we'll tackle the first one because it's easiest. So us as builders, building on GCP or Azure or AWS or any of the clouds, and they all have in their trust centers or in their shared responsibility page, they will show you in their compliance center, all the logos of the compliance frameworks that they adhere to. And what that means is that the compliance organization has said, you need to do the following things. You need to encrypt data at rest or encrypt data in transit.
You need to follow the principle of least privilege. You need to reduce your support infrastructure like here are all the good things you need to do. And what the certifications from the cloud providers mean, is that they've had an audit firm. So one of the big five, Ernst & Young or Deloitte, come in and audit how they run the service. So Azure saying that, Hey, we are PCI compliant for virtual machines means that they are meeting all the requirements that PCI has laid out to properly secure their infrastructure. So that as a builder means that we know they are doing certain things in the background because we're never going to get a tour. We're never going to get the inside scoop of how they do updates and they do patching. And frankly, we shouldn't care. That's the advantage of the cloud. Right?
Is like, it's your problem, not mine, that's what I'm paying you for. So compliance lets us verify that they're holding up their end of the bargain. So that's a huge win for everybody building in the cloud, whether or not you understand the mountain of compliance frameworks, the big ones are basically PCI, 27001 from ISO is basically just general IT security. We don't set our passwords to password, that kind of stuff it's basic hygiene. And then the SOC stuff is around running efficient data centers. Right? So it's like we don't let Joe wander in from the street and pull plugs. We have a process for that kind of stuff so great there. And the others are, if you're in a specific line of business. So if you're in the United States and you're doing business with the government, you need a cloud provider that is FedRAMP certified. Right?
Because that is the government has said, if you want to do business with us, here's the standard you need to meet. Therefore, FedRAMP is this thing that vendors and service providers can adhere to, which means they meet the government's requirements to do that. And most of these are set up like that. So even PCI is a combination of the big credit card processors. They've formed this third party organization that said, anybody who wants to do business with us, so anybody who wants to take credit cards needs to adhere to these rules. If you don't take credit cards, you don't care about rules. So, that's the different way of looking at compliance. So it's very case by case. If we're building a gaming company, if we're taking in-app transactions like, Fortnite through the App Store, that's a huge bonus they get, is that Apple covers the PCI side of it. If they were doing it themselves, they would then have to be compliant. So if we're not falling under those anythings, if we're just making a cool little...
We're not falling under those anythings if we're just making a cool little game where you upload a photo and we give you back a funky version of that photo, we don't have to comply to anything, right? As long as it's just a promise to our users. So that's the general gist of compliance. I don't know why I did wavy jazz hands, but there it is.
Jeremy: Well, no, I think that makes sense. I mean, you need to do something to make compliance exciting because I think for most people you're right, it's a document they could read and easily fall asleep. If you have insomnia, then compliance documents are probably the way to go.
So the other thing you mentioned, though, is that, again, you are always responsible for your data. And I think up until fairly recently, there were no super strict laws on the book that were about privacy in general. And so obviously we get GDPR, right? What does it even stand for? The General Data Protection Regulations, right? Did I get that right?
Mark: Yeah, you did.
Jeremy: So, that is European, it has to do with the European Union and that came out and that was really strict. And what they said was essentially, "Hey, I don't care if you're hosting in the United States, if you're Amazon or Google or wherever you are, if it is a European user's data that you have, then you are subject to these bylaws." And then very recently, I mean the same type of law, I don't know if they were modeled together, but the CCPA, the California Consumer Protection Act that came out for the United States. And again, it was just for California residents users' data, but also extends and applies all these different places.
So these are very strict privacy control rules. I mean, it even gets to the point where you're supposed to have a privacy control officer and some of these other things, depending on the size of your company. If we go back to this idea of where our data is being stored, so think about this, I am writing an application that uses DynamoDB, and my DynamoDB application has to de-normalize data in order to make it faster for it to load some different access pattern. Or, I'm using Redis or I'm using a SQL server that I'm backing up transactions, or I'm running data through Kinesis or through EventBridge. I mean, you've got hundreds of places where this data could go. Maybe it ends up in S3 as part of a backup, so I can run it through Athena and do some of these things. Now somebody comes along and says, "Hey, I have a right through GDPR and CCPA for you to delete my data and for you to forget me." Finding that data in this huge web of other people's services is not particularly easy.
Mark: Correct. So a little additional context around that, so CCPA is relatively new. When it was initially proposed, it was fantastic and then it got lobbied down significantly to the point where it doesn't even apply, unless you make at least $25 million a year. So it's not even...
Jeremy: Welcome to America.
Mark: Yeah, exactly. But it is a first test at scale in the United States as to whether or not legislation will work on that. And the reason it's in California is very specifically, a lot of the tech is based there. It is a good first step. So let's use GDPR as an example, because it's been out for two years now and there was a preview for two years before that, and it was 27 different nations coming together to figure out where they wanted to go. And we've got a lot more examples around it, but the core principles are the same, the United States is moving closer, but it's going to take a long time just because of the cultural differences, the political differences.
So GDPR really boils down for the users to something very simple. As a European citizen, I have the right to know what you know about me and what you're doing with that information. And if there's any issues with that, I have the right to ask you to remove it or to change anything that's incorrect. That's the user side of GDPR. And now there's a whole bunch of stuff behind that from the business side of GDPR, you already laid out one of the biggest challenges is, how the hell do I answer that question? Right? Especially if you're not building it fresh, if you have an existing application that was never designed with this in mind.
Now, the interesting thing for GDPR is that there are two very big sticks associated with it, why as a security and privacy guy, I love it. It's not perfect. But the first stick is that if you do not take reasonable steps to provide security controls with your infrastructure, you can get a fine of up to 4% of your global turnover. So not profit, 4% of your global take. So if you make a billion dollars, you could be fined up to 4% of a billion dollars, whether or not that's profit, paying off debt or whatever. So that's for not doing due diligence of adhering to something like an ISO 27001, or the basic security controls, right? So if I'm setting my passwords to password, I can get a big, big, fine.
The second big stick of GDPR is if I know there's a breach and fail to tell you about it, I can get hit with another 2% of my overall global take for failing to tell you within an appropriate amount of time, and that appropriate amount of time is 30 days or less. The average law in the United States says its best case effort for notification or at most 45 days. So GDPR is a very big stick, lots of reasonability behind there from the user's perspective. But from a builder's perspective, what you just laid out runs counter to most of the things we're looking for, right? We are trying to optimize, we're trying to denormalize data. You mentioned S3, think about Glacier. Glacier, just the costs alone of if I archive your personal data and shove it into Glacier, not only do I have to find it, and then I have to pull it out and either remove it or modify it and then put it back. That is a huge thing.
But again, like we talked about earlier, if you plan for this stuff ahead of time, it's not nearly that bad because it turns out when you look at this kind of data management through your application, there's actually a lot of benefits just to building your application, to being able to trace a piece of data through your system, to know what I know about you, Jeremy, as a user of my application, there are huge benefits because you lose these sort of legacy bugs where it's like, "Oh, you opened your account before 2008? Well you have this check mark instead of this one." That kind of stuff gets solved.
So for new businesses, I think if you understand it, it's a minimal cost just because it's really getting the expertise in to help you do those design work. For existing businesses though, it is a nightmare. Literally people spent two years getting ready for GDPR and then the regulators still gave them another year before they hit anybody with any substantial fines because of the massive undertaking it is to actually build that kind of infrastructure.
Jeremy: Yeah, no, I mean, and that's the other thing, I guess my advice would be if you're designing these systems and you're building these systems, I think people hopefully think about tenancy or multi tenancy when it has to do with building sort of bulkheads around different clients. So especially if you're a SAS company and you have a big client, you might want to separate out their data from other people's data and have those in separate places. You can do that to some extent, even with user data, right? And so knowing what data you're logging, where you're saving data, using an identifier that maybe obscures that user.
I mean, one way that we tried to handle this in the past was only having personally identifiable data associated in one place where that could be removed. So even though there was a unique identifier for that person, as long as you're removed in that one place, right, which again, and backups, but at least you removed in that one place then you would essentially forget. So you'd have the data anonymized, but you essentially forget it. Now, does that go far enough? I don't even know. I've read the GDPR documents before. And I mean, I read summaries of GDPR, what the rules are that I think were longer than the actual rules themselves. Because again, it is kind of confusing to go through that. So I think that's one thing, again, people of think about GDPR, think about CCPA.
The other thing that's been around for quite some time around privacy for children has been COPPA, which I always thought stood for the Child Online Privacy Protection Act. But I think you told me that it actually is the rule.
Mark: Yeah, the A is just made up.
Jeremy: The A is just made up. So thinking about Fortnite and YouTube and TikTok and all these things that children like to use and like to share stuff and are very quick to say, "Oh, I was born in 2007, I'm going to say I was born in 2005 so that now I'm over the age limit." And of course, there's no verification, there's nothing that stops somebody from doing that. So I'd love to talk about this because this is something, I mean, again, I'm a dad, I have a 14 year old and a 12 year old, and I will not say whether or not my 12 year old is using a Fortnite account that has an incorrect birthday on it, but it's possible she is in order to get access to that stuff. So what do we have to do from a privacy perspective and from a legal perspective in terms of protecting ourselves from this? Because this one has a lot of teeth.
Mark: It does. It absolutely does. So if we take it from the builder perspective, not the parental perspective, the builder perspective, there is a lot of, and we can cover the parental in a second because both of mine are under 13, so it's double whammy. But from a builder perspective, this is where you see in the terms of service that, again, nobody reads, it says you can't open an account if you're under 13 and I'm not a lawyer, thank God, I didn't even play one on TV, what that means is they're trying to shift liability to the user and saying, "If you lie to sign up, that's on you not me."
Because the nice thing about COPPA and it's design, it does actually have a reasonable structure to it to try to prevent companies from tracking kids under 13 online. So you see it a lot of impacts in advertising. And so YouTube at the start of the year, there was a huge push where YouTube basically asked anybody uploading any videos, is this made for kids? Do you think kids might be interested in this? Because if so, we're not putting ads against it because we don't have the ability to turn off all the tracking in the backend so it's ads or nothing. And there was a huge uproar around it and they've softened that interpretation, but it's because they got hit with $170 million fine against this rule because they weren't following it.
So from a builder perspective, it's being aware that if you're serving to children, like if you have an application that is... So let's back up for two seconds, ignore the case where people are lying to get on, right? You need to put that reasonable and say for a Fortnite example saying, "Hey, we rated it 13+ so it's not marketed towards children. We've said it's 13+ for maturity level, just like the movies are rated, the games are rated, and we've added in the terms of service that you shouldn't be playing, you shouldn't open an account unless you're 13+." So we're pushing liability to the user. So in that case, you should probably be covered.
But if you're actually making something that covers kids and families, this is a very real thing that you need to adhere to the rules of the act, which essentially say you can't track kid, you can't advertise directly to them. So where this falls, a question I get a lot, is around schools, especially now that kids are back in school or going back to school, even remotely is G Suite for Education versus G Suite are the same thing, software wise, but very different things legally. And so school boards need to understand what services from Google fall under the G Suite for Education license, because that license follows COPPA to the letter and says, we're not tracking kids, we're not moving this, that, and the other thing. So when you're signed in as a child from the school board and then surf YouTube, the normal tracking doesn't happen on YouTube, it actually creates a shadow account that's not associated to your account and tracks that and doesn't link it back to you as a child. Whereas if you're a normal G Suite user to start to surf YouTube, all that activity is linked back to your G Suite account.
So as a builder, if you're designing something that could be targeting kids legitimately, you need to understand that there's a very hard line that you can't do a bunch of tracking. You can't take the same level of PII, if any, at all, you need to provide adult controls. There's a whole bunch of things that are worth consulting an expert on this for to make sure that you don't follow through. On the parenting side, it's a great excuse to say no for social media, if you want no for social media for the young kids, when they're like, "I want Insta," and you're like, "You legally can't have it. You're not 13."
Jeremy: Right, right. Well, I mean, I again, I think about those situations that pop up all the time, where people are building things that affect kids, I mean, in Fortnite is a good example in the sense that yeah, because kids aren't going to play Fortnite. I mean, it's clearly made for kids. And again, you say 13 and older, and that's fine, but think about Netflix profiles, right? You create a profile for your kid and it tells you what your kid was watching. Now so that's good, right? Because you can go and see what your kid watches, but are they using that data for advertising or optimizing what they might show to your kid? If they're using that for recommendations, where does the privacy line kind of sit for those things?
Mark: Yeah. And that's a very good use case because a lot of kids I know, unless they're really young, don't want the Netflix Kids interface. They want the actual Netflix interface, right? Because Netflix Kids, for little ones it's great because it just shows you Dora and Teletubbies, cool, I can just click on the icon I want. For kids, once they pass five, they're like, "I want to search, I know what I want, I want Transformers or Glitch Techs," or whatever.
The interesting thing about COPPA is optimizing your service is almost always the out for a lot of these privacy regulations and there's always outs. And this is where it really comes down is, none of these regulations are perfect. There's the letter of the law and then there's the intention behind the law. And that really depends on company culture. Almost every company follows the letter of the law or what they think they can argue that letter to be because COPPA has very real fines behind it. There was "Silicone Valley" on HBO had an episode where they were freaking out because their chat app was popular with pre-teens and I think it's 48 or 58,000 per user fine, right? So if you have millions of users, it's an insanely high fine. And that's great. We want that as parents for that protection.
But the line, what your question really hits on is even with GDPR, even with all the CCPA, is what is personal information is really the core question. And there's no clear answer because what you think of personal information and what the law thinks is very different because this is where a pet peeve of mine with Facebook in general is when they're arguing in front of Congress and the first time Zuckerberg testified, they directly asked him and said, "Do you sell user data?" And he honestly in his data-like face said, "No, we don't sell user data." Because they don't sell user data because their understanding of user data by their definition is data that you have uploaded to the service. So your status updates, the photos and movies that you upload is user data, the things you type in are user data.
But, what Facebook sells is access to your behavioral and demographic profile that they have created from user data. So what's Facebook sells is not user data, Facebook sells access to data about users. Now that seems like a super fine semantic hairsplitting thing and it is, but that's the fundamental thing that you're talking about with even the Netflix example, what your kids watch is that user data, or is that data about users? Because all this regulation protects user data, not data about the users and there's a multi-trillion dollar economy dealing in data about users and they don't care about user data.
Jeremy: Right, yeah. And I don't think we have enough time to get into all the details of that. But what I will say is I do know that for me personally, I don't like to share a lot of my personal data. Facebook is just a pet peeve, nevermind pet peeves about it. I mean the whole thing, I'm not a fan of it just because I do feel like it's very exploitive and it's something that, again, once our parents got on it, it just ruined the whole thing anyways.
But I do think that there are valid use cases for taking data about a user, maybe not user data, but for optimizing your application. So I do think that that does make a ton of sense. I mean, again, what are the most popular products that are being clicked on? If you couldn't record that and then use that to show products, I mean, that would be pretty bad. But knowing your particular preference for certain things, and then being able to sell that to an ad company that they can combine with something else that then can serve you up targeted ads, good for the ad companies. maybe good for you if you're getting really relevant ads, but at the same time, it's just a lot of that feels dirty and creepy to me when you start getting very specific on profiling individual users.
But hey, all you got to do is read these privacy documents, these terms and conditions and you'll see exactly what they're doing. So if you don't have a problem with it, I mean, it's kind of hard because I think most people would just glaze over them.
So another one though, another law that has a ton of teeth and I think is going to be more important given the fact that we are now living in a COVID-19 world and more people are building telehealth apps, or they're building other apps, like even tracking COVID cases and some of these other things. For a very, very long time, there is a law called HIPAA, right, that is to protect medical data and all that kind of stuff. Where does privacy play in with that, especially now with all of this medical data, a lot of it being shared?
Mark: Yeah. Yeah. And that's a really interesting example and I'm glad you brought it up because of the privacy, because there's lots of opportunity here and if you're building an application to service the medical community, and that medical community expand far beyond doctors, it's all the third-party connections and things, they all have to follow HIPAA when it comes to health information, so now we're talking about PHI so personal health information in addition to personal information PII, right? So you have both of these in this application. And HIPAA dictates what you're allowed to do with that health information. And again, it comes down to a lot of transparency required because us as patients want that data shared. If you give a history to your doctor and your doctor refers you to a specialist, you want your doctor to share that history information with a specialist because you don't want to go to the specialist and take half an hour of that first appointment reiterating what you already told the first person, right?
And when you go get x-rays or an MRI, you want the results of that to be sent back to your doctor. You don't want to walk around with a USB stick and go, "I brought my data, please analyze it," right? So there's a lot of efficiencies to be had. And HIPAA dictates the flow of information between different providers. And that's why there's a lot of consent forms dealt with HIPAA. So when you sign up with your doctor, if you go to a GP for the first time, they're going to get you to sign a data-sharing document that basically says they're allowed to share information with other specialists that they refer you to, with insurers in the States, again, an outlier given how the rest of the world works with insurance.
But the interesting thing, again, is that as a builder, you need to make sure the service you're dealing with is HIPAA compliant otherwise you can not be HIPAA compliant. But specific to COVID HIPAA has an exemption like most privacy acts that says if it's in the interest of public health, all of these controls can be foregone and that data can be shared with centralized health authority. So in the States, that information could be shared with the CDC. So everything you've told your doctor could theoretically be shared with the CDC if it's in interest of going and helping prevent the spread of COVID-19.
Now, there are lawyers on every side of it, that is the one advantage of the United States. While you lack the overall frameworks, you have more than enough lawyers to make up for it. So the originator, your doctor's office is going to push that through a lawyer before they release all the information up to the HMO. The HMO is going to go through their legal team before they send it to the CDC and so forth. But it is an interesting exemption saying essentially, I don't necessarily care about your history of back issues or sports injuries or blah, blah, blah but I want to know every patient that has tested positive or had any test for COVID-19 because we need those stats up, right? And we need to roll that up at the municipal level, at the state level and at the federal level, because it's in the public interest.
And in this case, and it's a common challenge and it's always all shades of gray, is your personal privacy is not more important than the general health of the community or of the state or the nation in certain cases. And I think a global pandemic provides a lot of argument on that front, but we had a case here in Canada where law enforcement made an argument early in the pandemic that they wanted a central database they could query that would tell them if someone they were dealing with had tested positive for COVID. And that went up to our privacy commissioner then to our federal privacy commissioner, because their argument, the first responders, the police argument, was we could be potentially exposed and we want to know, which there's validity to that argument. But the flip side was, well, is it enough to breach this person's personal privacy? And the current result the last time I checked was, no, it wasn't.
Whereas the aggregate stat, if I test positive or negative, that stat is absolutely pushed up with not my name, but with the general area where I live. So in my case, my postal code, which is the same as the zip code, that gets pushed up because that's not a breach of my privacy, but it helps the information, right? I shouldn't say it's not a breach, it's a tiny breach compared to the big benefit that the community gets. So fascinating, all shades of gray, no clear answers, but it is an exemption and those are not uncommon. There's also exemptions and privacy laws for law enforcement requests, right? If a law enforcement officer goes to a judge gets a subpoena or a warrant, all the privacy protections are out the window.
Jeremy: Right. Well, I mean, it's funny. I mean, you mentioned sort of, this idea of the contact tracing application and obviously there's privacy concerns around that if it's about specific people, but from the law enforcement perspective, I mean, obviously I'm sure you've been paying attention to what's been going on in the United States, if somebody has preexisting conditions even using a taser on somebody, which is probably a bad idea in most situations anyways, but if there were underlying health concerns that they could say, "Oh, this person does have, I don't know, asthma, or they have a heart condition," or things like that where using different types of force or using different methods could cause more harm than it does good if it does good in some cases, but that would be interesting data to potentially be shared maybe with a police officer or a law enforcement officer or maybe not, right? So, I mean, that's the problem is that, you're right, there's data that could be shared that could be beneficial to the public good, but then on the other side, there's probably a lot that you don't want to share.
Mark: Yeah. So the more common example with law enforcement is outside of the health information is your cell phone, right? So your cell phone location, the question of, so not only getting access to the phone, but the fact that your phone constantly pings the network in order to get the best cell service, right? So at any given time, you're normally within, if you're in the city, there's five cell towers you could be bouncing off of and one or two of them are going to be better than the rest because they're closer physically.
And the question you see this in the TV all the time where depending on the jurisdiction, law enforcement may or may not need a warrant to get your location information. There are requests from certain cell providers where they can file it and get the location of your SIM card right now, or your identifier for your phone without going through a significant legal process, it's just a simple request either from the investigating officer or from the DA, instead of going through a judge.
And that's an interesting one because that's never been argued out in public. And I'm a big fan of, there's no wrong answer, you need to just have transparency so people understand. And when decisions are made behind the scenes, because you can see the argument either way, right? If a kid is lost and besides sending out an amber alert, which I think you guys have, we have them here where they blast everybody's phone, instead of sending an amber alert, if you could ping that kid's phone and know where they were, you may be able to retrieve that child. And we know when children are missing, every minute counts, right? The outcomes are significantly better the faster you find that kid, regardless of the situation. But on the flip side, if they're tracking you because they suspect you of a crime, but that hasn't been proven, is that a violation of your rights?
So when it comes to privacy, one of the reasons I love diving into it is because it is all nuance and edge cases and specific examples that overriding things. But most of it's done behind the scenes, which is the one thing I do not like, because I think it needs to be out in the open so people understand.
Jeremy: Yeah. And I think the other thing that is sort of interesting about this electronic surveillance debate versus the traditional analog thing, I'll give you this example, I remember there was a big to-do, and I think it was going around on Facebook or one of these things where people are like fo not put your home address into your GPS in your car, because if somebody breaks into your car, then they can just look at your GPS and get your home address, or they could open your glove box and take out your registration that also has your home address on it, right? So that's the kind of thing where that to me was kind of dumb.
Now, the other thing, going back to the police example, is that I read somewhere and again, I'm really hoping this is true because it's so hard to trust information nowadays, but that if a police officer or a detective wanted to wait outside someone's house and follow them, right, that's perfectly legal for them to do. If they have some reasonable suspicion, they can follow somebody, they can tail somebody, whatever they call it, they can do that. But to put some sort of tracking device on their vehicle, that they can't do, although it's sort of the same thing, except one requires a human to be watching and the other one... sort of the same thing, except one requires a human to be watching and the other one doesn't. And again, I'm not a huge fan of surveillance, so I'm definitely on the more restrictive side of these things. But at the same time, those arguments just seem really strange to me, that it's like it's legal in one sense if it's analog, but it's illegal if it's digital.
Mark: Yeah. And so even clearer example, in most states, and again, not a lawyer, but you require a warrant to get the passcode or password for a user's phone, but do not require a warrant to use a biometric unlock. So I can force you to use your thumb or your face to unlock your phone, but I can't force you to give me your passcode. Both unlock the phone, right? The passcode, the biometric, I mean, there's technical differences in the implementation, but the end of the day, you're doing the same thing. But it's the difference between something you are and something you know, and you can't be compelled to incriminate yourself in the United States. So, there's that difference, right? And if you go to the border, all this is moot, because there's an entire zone in the border where all your rights are essentially suspended.
I think it comes down to the transparency, but for all the examples you just mentioned, the law is always about 10 to 15 years behind technology. So this comes back to one of my core experiences as a forensic investigator. Now I've never testified in court, but I'm qualified to. Most of the reports and things that I worked on were at the nation state level and never get to court. But the interesting thing there is, the amount of cases I've reviewed from court findings and having forensics done, they're all over the map, right? And the case of like, "Oh, an IP is an identifier." And I'm like, an IP is not an identifier of a specific device. If it goes into a building that has 150 devices behind a NAT gateway. Which of those devices committed the act that you're in question?" You can't prove with just an IP address, right?
But it's been accepted in a ton of court cases, because again, comes back to privacy as well, the law is way behind the capabilities. And this is the challenge of writing regulations, of writing law, you need to keep it high enough that it's principled and then use examples and precedent for the specific technologies, because they keep changing. Because yeah, some of the stuff, like the tailing something, is such a ridiculous difference between the two, but it is legally a difference. Similar, to tie this back to the main audience of builders building stuff in the cloud, especially around serverless, how we handle passwords is always very frustrating for me. So let me ask you a question. Why does a password field obscure the information you're entering into it? This is not a trick question, I'm not trying to put you on the spot.
Jeremy: If people looking over your shoulder can't see it, or so that you don't remember what password you typed in.
Mark: Both are true, but yes. So the design of that security control is to prevent a shoulder surf, right? To prevent somebody looking over the shoulder and typing in. So the question is, why do we have that for absolutely every password field everywhere, when there are very low likelihoods of certain situations where people are looking over the shoulder. Which is why I love, and amazon.com has this, and a bunch of other people are starting to add the show my password, to allow the users to reduce the number of errors you put in. Because if I'm physically alone in my office, this is a real background, nobody is looking over my shoulder and watching my password. So why I can't see what I'm typing, right? Similarly, when people say, "I'm going to prevent copy and paste into a password box."
Jeremy: Oh, I hate that.
Mark: Absolutely. Prevent copy a hundred percent of the time. But paste is how password managers work. And what's the threat model around pasting into a box, right? So you paste the wrong password, who cares? So it's understanding the control and why you're doing it, is really, really critical. Same thing on the passwords, why people freak out when I tell them, "Well, write it down. If you don't have a password manager, write it down on a piece of paper." They're like, "What? Oh my God, why would I do that?"
Well, writing it down on a piece of paper and putting it under your keyboard in an office is a dumb idea, because that's not your environment. But we're both at home and if I put my password under my keyboard, it's my kids and my partner that are the threat, potentially someone I invite into my home. But if I've already invited them in my home or it's someone in my family, there's a bunch of other stuff they already know anyway, whereas that will help me having it written down. Again, no bad decisions, understanding the implications of your decisions and making them explicitly, covers security, it covers privacy.
Jeremy: Right. And I can think of 10 people off the top of my head that probably could answer every security question that I've ever given to a bank or something like that because they know me.
Mark: Exactly, right?
Jeremy: And I think that's a good point about some of these security controls we put into place that are, I guess, again there's just friction and it gives security a bad name. My payroll interface that I use is all online, and whenever I have to have a deposit for my taxes, it tells me how much money it's going to take out of my account for my taxes. Well, I move money into separate accounts for those certain things. So I like to take that amount, copy it, and paste it into a transfer window on another browser in order to transfer that money, so that it's in the account that will be deducted from. I cannot copy from that site, it won't let me copy information from that site. And I think to myself, "Why? I can print it to a PDF and copy it from the PDF. I can print it, I can do other things." So why do you add that level of friction that potentially creates mistakes? Like you said, which is why that show password thing is so important.
So anyways, I want to go back to the HIPAA thing for a second, because this is something where we may have gotten a little off topic. I think it was all great discussion, this stuff to me is fascinating. But the point that I wanted to get to with the HIPAA is, if I'm sharing your x-rays, okay, I get it, I've got to be HIPAA compliant. But where is the line for these builders that are building these peripheral applications around medical services, medical devices, medical professional buildings, hospitals, whatever. Where's the line? Because I think about an application that says...
You see this all the time, I just started dealing with this. We just got a new dentist, my old dentist retired, he was completely analog, I don't even think they had an email address. Everything was phone calls, I mean, they were excited when they could print out a little paper card and give it to you with your next appointment on it. So I moved to a new dentist. This new dentist has a hundred percent online scheduling, right? It's great, you pick your hygienist, you can say when you want to set your appointment. And I think about this for doctor offices as well, because I know with my doctor's office it's through a larger, I don't know, coalition or whatever it is. And so they have this health center that you can log into. I don't think you can make appointments, but there's some stuff there.
But let's say someone's building a simple application that is just a scheduling app, right? Maybe you're a little doctor's office or a dentist office, whatever, and you want this scheduling capability. So if I go and I allow this scheduling, if I'm booking an appointment for a general physician or whatever, or a general practitioner, okay, probably not that big of a deal. But what if I'm booking for an oncologist? What if I'm booking for an obstetrician? What am I'm booking for Planned Parenthood or something like that, that gets into really specific things about, obviously, my health, or my spouse's health, my kids' health, whatever it is. When you start booking into specific types of doctors, even though you're saying, "Well, we're not sharing any information about your medical." That reveals a lot right. So when does that get triggered? When does HIPAA get triggered?
Mark: Yeah. And you'd have to consult a lawyer to get the actual official answer, because it's case by case. And I always have to say that, most of my conversations start with big disclaimers. The challenge here, so if I take this from a builder perspective, right? So if we're focusing on the audience who's listening, who are watching, they're probably building applications like this or interacting with them. It is easier to take a more strict approach from the builder side, because you're never going to regret having taken more precautions, if you do it early so that you're not introducing friction. So treating everything as a personal health information or personal identifiable information is going to give you a better outcome. Because if you're like, "Oh, I treated that as health information." And it wasn't, the cost is almost minimal to you when you're designing it from day one.
Because even, you said, well, the GP is not that big of deal. Well, not only is the doctor still a big deal, because it means something is of concern, even if it's a checkup. But if you have a notes field where you say, why are you requesting this appointment? Lord knows what people are going to type in there. Right? Because they're going to assume if this is just between me and my doctor, they will be like, "Well, I have a lump on my neck that I want to get checked out." Oh my God, that right there, diagnosis or symptomatic information is health information. Right? Because even if they just said like, "Oh, everything's fine." Well that still can be treated as profile information. Now the problem is, just like most of the privacy legislations, there's this concept of user data and data about the user.
So, HIPAA mainly focuses on user data, which is again, what you're typing in and the specific entry. So the information you just said, now I believe this to be true, but I'll have to double-check, the fact that you're seeing a doctor is not necessarily protected under HIPAA. So the fact that you've booked in with the oncologist, or the pediatric surgeon or whatever the case may be, is not a specific class of data that needs to be protected.
The information you share with them, your diagnostic results, all your blood work, all that kind of stuff absolutely is. But the fact that you haven't, that you just booked an appointment or you spoke to them on the phone, isn't necessarily protected. I believe it should be, because it is very clear that I don't care what type of cancer the target has, I just care that they do. Because that's something I, as a cyber criminal, can manipulate to get what I want, right?
So that's a big problem, is that there's not that line. So similarly, a different example under medical is the genetic testing, right? So 23andMe, ancestry.com, hey, test your genes at home. They all advertise like, "Hey, we keep your data super secure. We protect your health information." And blah, blah, blah. But they aggregate out your genetic code and use that for a whole bunch of stuff in the back end. And they said, "Well, we don't tie it to you." Well, it's easy enough, if somebody gets that piece of information to then tie it to you, if they have access to the backend systems anyway. And that's the challenge we deal with, with all these data and specifically with healthcare, is it's very rarely one piece of information that is the biggest point of concern. It's the multiple pieces of information that I can put together in an aggregate to get a better picture of what I wanted as a malicious actor.
So I don't care that you spoke to this doctor, but I do care that you went from no doctor's appointments to five in a month, right? Because I don't know what's wrong, but I know it's something big, because who sees five doctors in a month, from never seeing a doctor in the last year, right? Something is happening. And so if you put your bad guy hat on, if I'm trying to break into the government and you work for the government and I realized there's something wrong, there's a good chance that I can make a cash offer, that you're looking at a mountain of medical bills, and I could probably compromise that way and say, "Hey, here's a million bucks. I need your access." and I'm in. Even not knowing what was wrong, but just knowing that pattern has changed.
So it's again, a lot of nuance and a lot of challenge. But from a builder perspective, if you treat everything in a health application as PHI, your cost is not going to increase significantly. If you plan early enough, your friction isn't going to increase, but you're definitely going to protect your users to a higher level, which is actually a competitive differentiator as well.
Jeremy: Yeah. Yeah. Well, I think the best advice out of that is, seek professional legal help for those sorts of things. And that's the thing that just makes me a little bit nervous. Whenever you're building something new, you might have a great idea, or you're going down a different path, you're pivoting a little bit, that when these things come up, you do need to have legal answers and solid legal advice around these things to protect yourself.
All right, we've been talking for a very long time and thinking back about what we've talked about, we've probably scared the crap out of people thinking, "Oh my goodness, I'm not doing this anymore." But let's bring it back down, because again, I don't think anything we talked about as hyperbole. I think all of these things are very, very real. The laws are real, the compliance regulations are real. Just this idea of making sure that the data that's being saved is encrypted and that you put these levels of control into place are there. Those are all very, very real things.
But you had a talk back last year at Serverlessconf New York. And I thought it was fascinating, because essentially what you said was, "The sky is not falling." We are not now opened up to all these new attack vectors, that, yes, there are all these kinds of possibilities. So let's rebuild up everyone's confidence now that we've broken it all down and let them know that, again, building in the cloud and building in serverless, that, yes, you have to follow some security protocols and you have to do the right things, but it's nowhere near the gloom and doom that I think you see a lot of people talking about.
Mark: Yeah, for sure. And I think that's absolutely critical. If there's a second takeaway besides find legal advice is, stop worrying so much. And I think that's where I have challenges and interesting discussions with my security contemporaries, because when we're talking amongst ourselves, we talk about really obscure hacks, interesting vulnerability chains, zero day attacks, criminal scams, all this kind of stuff, because we're a unique set of niche experts talking about our field, right?
Whereas, when you're talking about general building and trying to solve problems for customers and things like that, you have to look at likelihood. Because risk is really two things; it's the probable impact of an event and the probability that that event will occur. So security is very good, outside of our security communities, about talking about the probable impact of an event. So we say, "Oh my God, the sky is falling. If this happens, your entire infrastructure is owned." But what we don't talk about is the likelihood of that happening. And be like, "Yeah, the chances are one and 2 trillion." And you're like, "Well, I don't care then."
It's interesting, nerd me is like, "That's interesting and I like that." But the reality is that often, the simple things that cause... So if you look at the S3 buckets you mentioned in one of the early questions. I followed those breaches very, very closely. If you want to follow at home, Chris Vickery from UpGuard, his career over the last couple of years has been focused almost exclusively on S3 bucket breaches, fantastic research from him and his team. In every single case, it has not been a hack that has found the data, it has been simply that it was accidentally exposed.
Given the probability, yes, zero-day vulnerabilities are real, cyber crime and attacks are real. You will see them over the course of your career. Your infrastructure will be attacked, simply because it's connected to the internet, that's just the reality. You actually don't have to worry about that nearly as much as you think, if at all. What you need to focus on is building well, building good resilient systems that are stable, that work reliably, and that do only what you want them to do, is going to fix 95% of the security issues out there, and then you can worry about the other stuff. So the S3 bucket are just mistakes, they're just misconfigurations. Even Capital One, who unfortunately got hit with a $70 million fine because of it, it was a far more complicated mistake, but it was still a mistake.
Basically, they had a WAF that they would custom build on an EC2 instance in front of S3 and that WAF's role had too many permissions. That's it, right? A mistake was made and it cost them, literally and reputationally. So it's the likelihood of something happening is that you are going to mess up. So putting in tools in place, things like Cloud Custodian, the great open source project, little testing around security configurations, AWS Config, using Google's security operations center, all these tools that are at your disposal that are free or almost no cost, to help prevent making mistakes. The idea to keep in your head as a builder is that, you drew it up on PowerPoint, that's great. You drew it on the whiteboard, that's fine. You need something that checks to make sure that production is what you drew, and that's going to cover the vast majority of your security concerns. And if you get that done, then you can worry about the obscure cool stuff.
Jeremy: Right. Yeah. And I think you're totally right. I think that if you cover the bases, and again, just like I said, especially with serverless, it almost all comes back to application security. I did an experiment two years ago at this point, where I was able to upload a file to S3 and the file name itself was SQL server injection attack, basically, that was the name of it. And so when it tried to load the ID or whatever the file name was, into a piece of code that, again, didn't think about SQL injection, because maybe it thought the source was trusted or whatever, that then there's the problem there. How many people are going to even try that one? That's one thing.
And then the other thing is, that again, if you're building solid applications and following just best practices, you should be thinking about SQL injection. That's a very real thing that if people don't build... And of course, there's so many tools now that you just shouldn't be building SQL injection anymore, but people still do. But again, I think there is a sense of a doom and gloom or FUD, you know what I mean? Trying to get people to buy these security applications and things that they do. Because again, I think that if you don't think there's a problem, you're not going to buy a solution to fix it, right?
And for the very few people I think, who do get hacked, they're like, "Oh, I really wish I bought that solution." So I don't know what the right level of advice is. I think you're right, as to say, your likelihood is very low, but I still think people should think about security and put that first in a way that says, yes, maybe I don't have to worry about the obscure attack this way, but I should just make sure that I'm following best practices. And like you said, I like that idea of using open source projects to make sure that your infrastructure, or your proposed infrastructure and your actual infrastructure do match.
Mark: Yeah. So let me say this, coming from a vendor. So Trend Micro was obviously a vendor, or one of the top vendors there. I don't agree with everything we put out from a marketing perspective, because sometimes it does skew negative. We try not to, but it still does, because like you said, if you don't believe there's an issue, you're not going to buy a product, and all the vendors are generally guilty of this.
But I think it's a misunderstanding of what security's role is in the build process and in building applications. And I think if you're sitting at home right now, you're listening to this, thank you for sticking along, it's been a long episode, or broken up into a couple. But I think it's all important and it's interesting, but it's not just academic, like you said. There's real issues here, there are real things that are going on, but the best way to look at security controls is actually from a builder's perspective.
You mentioned SQL injection, there is an open source, fully tested, phenomenal input validation library for every language out there. There is no reason you should ever write your own. You should just import one of these, there are any sort of different licensing available as well, and so you import that and you get that to check your validation. Because I think that is an example of larger security controls.
Security controls can help ensure that what you're writing does what it's supposed to and only that. So input validation is a great example. If I take a lambda function that takes an input that is a first name and a last name, well, I need to verify that I'm taking in a name, that's a valid name, first name and last name and not a SQL injection command and not a picture, somebody trying to upload a data file and things like that. That's a security control.
And if you see it as not trying to stop bad stuff, but from making sure that what you want to happen is the only thing that's happening, you start to adjust how you see these security controls and go, anti-malware the classic, classic security control, you're like, "Well, I'm not going to get attacked by malware." Don't think of it as something to stop attacks, even though it will. Think of it as, I'm an application that takes files from the internet, there's bad things on the internet. I cannot possibly write a piece of software, in addition to building a solution, that is going to scan this file to make sure that only good things are in there.
Well, there's an entire community of security vendors that do that for a living. Pay for one of those tools, not to stop attacks, but to make sure the data you're taking in is clean. So when you adjust that kind of thinking and realize the security controls are just there to help you make sure what you think is happening is actually what's happening, you start to change your perspective and go, "Well, there's great open source stuff. There's great stuff for purchase, but I don't need to buy anything and everything. I don't need all this crazy advanced threat hunting stuff. I just need to make sure what I'm building does what I want and only that."
Jeremy: And I think if we tie this back to the original topic here, is that the privacy of your user's data, is going to be dependent upon the security measures and the rules that you follow to make sure that that data is secure. So Mark, listen, thank you so much for spending all this time with me and just sharing that perspective. I mean, this was an episode I really was excited about doing, because I do think this is these things that just people don't necessarily think about. I know we didn't talk a ton about serverless, but I really feel like it all does tie back to it and I mean, just cloud development in general. So again, thank you so much for being here. If people want to find out more about you, watch your video series, things like that, how do they do that?
Mark: Yeah. And thank you for having me, I've really enjoyed this conversation and hopefully, the audience and we can all continue this conversation online as well. You can hit me up on Twitter and most social networks @marknca. My website is markn.ca and everything's linked up from there, my YouTube channel and all that is there as well. So happy to keep this conversation rolling in the community as well, because yeah, even though we weren't specifically talking about a lot of serverless aspects, I think the principles apply to everybody. The good news is, if you're building in a serverless environment and serverless design, you're already way further ahead than most people, because you've delegated a lot of this to the cloud providers, which is a massive win, which is one of the reasons I'm such a huge fan of the serverless community.
Jeremy: Awesome. All right. Well, we'll get all your contact information into the show notes. Thanks again, Mark.