Podcast: Break Things on Purpose | The Hill You'll Die On
Break Things on Purpose is a podcast for all-things Chaos Engineering. Check out our latest episode below.
You can subscribe to Break Things on Purpose wherever you get your podcasts.
If you have feedback about the show, find us on Twitter at @BTOPpod or shoot us a note at podcast@gremlin.com!
In this episode of the Break Things on Purpose podcast, we ask our guests for their strong opinions.
Episode Guests
- Brian Holt (@holtbt): full episode
- Jérôme Petazzoni (@jpetazzo): full episode
- J Paul Reed (@jpaulreed): full episode
Transcript
Brian Holt: Oh, and Angular sucks. That's, no. No, I'm kidding. That's a joke. That, Angular is fine, but I don't want to write it anymore.
Jason Yee: Welcome to Break Things On Purpose, a podcast about Chaos Engineering with a generous splash of strong opinions. We're currently on mid-season break and busy recording new episodes, but we wanted to share some additional content with you. We've asked guests from this season to share their strongly held opinions and tell us what's the hill they'd die on. Up first, Brian Holt.
Brian Holt
Patrick Higgins: I need to know what's the hill that you'd die on.
Brian Holt: A technical hill, right? I got lots of other hills I'd die on, but...
Patrick Higgins: Just one that won't get you canceled.
Brian Holt: Okay. That's good context. Almost all the programming languages that we talk about, which, you know, Node Python, PHP, Go. Most of these can scale bigger than your app is going to get.
Like, we all love to argue about, it's like, "Oh, you chose PHP. That's not going to scale." It's like, "Yeah? Get lost!" Like, they got GitHub to scale Ruby to enormous size. Right. And same with Twitter. And so I'm so sick of having the argument of, "don't pick this", you know, "we need to write everything in Rust right now." I just don't believe it. Like you don't have the scale, you don't have the technical needs. Most people to really have it come down to the language level and that's, that's a hill I'll die on.
Jérôme Petazzoni
Jason Yee: Jérôme Petazzoni shared some of the challenges that the Docker team faced in their early years. Here's his advice:
Jérôme, What's the hill you want to die on?
Jérôme Petazzoni: I think we have to stop unconditionally advocating for using HTTPS everywhere, especially for package updates. And I already had some folks asking me about this recently and it was like, yeah, when you say, "Oh, it's fine to distribute updates on HTTP," folks who are like, but, but, but all the attacks we can do on that!
And I'm like, no, no, no, you don't understand. What I'm saying is: Yes. Distribute the metadata, the list of packages, the checksums, the signatures, the keys, the hashes, all that stuff. Yeah, distribute it over like TLS, have it certificate protected, et cetera. But the bits like the, the big bits, , distributed that over whatever, like HTTP, FTP, etc. because we can cache that stuff like there is no tomorrow and we can cut down on hosting costs.
I don't know exactly how much the Docker Hub hosting bill is, but I had some back of the envelope calculations at some point. And I was like, yeah, we're probably looking at a few million [dollars] a month, between like a S3 and CloudFront and etc.
Now let's look at the Debian repo hosting costs. And the answer is very close to zero because every single university and hoster wants to have a Debian mirror and it's pretty easy to do so. And part of me feels like it's a big tragedy of commons that when we designed that whole registry protocol at Docker, we didn't have that built at the core of it from day one. So I'm not saying that this is the sole reason that tanked Docker, Inc. certainly not. But I want to think it still contributed a little bit, you know, like death by a thousand cuts.
J Paul Reed
Jason Yee: J Paul Reed joined the show to chat about resilience engineering and root cause.
J Paul Reed: Okay. So I have five dirty words of continuous improvement and root cause is one of them. And you can see my rant there.
But at this point, the thing that just makes my head explode is you talk to folks and they're intelligent folks and software engineers and you'll be like, "Hey, do you believe in complex systems? Is your software system like super simple?"
And they're like, "No! It's got lots of ins and outs and lots of things going on. There's stuff going on there." Right.
And then you're like, "Okay! Well in a complex system, there's no such thing as a single causal root causal factors, right. There's contributing factors." And then when, and then when you can, like, let's just base it down to one root cause. Right. And so that's the hill I'm going to die on.
Listen, if you are one of those people that says, "I work in a complex system," then stop saying root cause please!
That's the hill I'm going to die on.
Jason Yee: If you'd like to hear more from these guests, you can find their episodes on our website at gremlin.com/podcast or on Apple podcasts, Spotify, or wherever you listen to your favorite podcasts. We'll have new episodes soon. So follow us on Twitter @BTOPpod or subscribe to the podcast to be notified when they air.
Our theme song is called "Battle of Pogs" by Komiku and is available on loyaltyfreakmusic.com.
Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.
sTART YOUR TRIALWhat is Failure Flags? Build testable, reliable software—without touching infrastructure
Building provably reliable systems means building testable systems. Testing for failure conditions is the only way to...
Building provably reliable systems means building testable systems. Testing for failure conditions is the only way to...
Read moreIntroducing Custom Reliability Test Suites, Scoring and Dashboards
Last year, we released Reliability Management, a combination of pre-built reliability tests and scoring to give you a consistent way to define, test, and measure progress toward reliability standards across your organization.
Last year, we released Reliability Management, a combination of pre-built reliability tests and scoring to give you a consistent way to define, test, and measure progress toward reliability standards across your organization.
Read more