Can We Stop With Those Horrible “System Overview” Dashboards Already?
https://betterprogramming.pub/can-we-stop-with-those-horrible-system-overview-dashboards-already-5ea10a28fecf
https://betterprogramming.pub/can-we-stop-with-those-horrible-system-overview-dashboards-already-5ea10a28fecf
Recruiting developers into Site Reliability Engineering (SRE)
https://www.srepath.com/recruiting-developers-site-reliability-engineering-sre-guide
https://www.srepath.com/recruiting-developers-site-reliability-engineering-sre-guide
Bad Observability
Observability has become a bit of a buzzword in the industry for the last few years. Exactly what "observability" means depends on who you ask, but most people would agree its about both:https://squaredup.com/blog/slight-reliability/bad-observability
- Being able to observe customer experience and behavior
- Being able to observe and understand what's happening within our technology solutions
There's plenty of content out there telling you how to implement observability, or what good looks like. But what about bad observability? What are some anti-patterns to watch out for?
What Are Structured Logs and How Do They Improve Performance?
Logging information in a structured format for better analysis and processing of log datahttps://betterprogramming.pub/why-you-should-use-structured-logging-format-47a388711316
Need your own incident post-mortem template? Here’s ours
Having a dedicated incident post-mortem is just as important as having a robust incident response plan. The post-mortem is key to understanding exactly what went wrong, why it happened in the first place, and what you can do to avoid it in the future.https://incident.io/blog/incident-post-mortem-template
It’s an essential document but many organizations either haphazardly put together post-incident notes that live in disparate places or don’t know where to start in creating their own post-mortems. To help, we’re sharing the incident post-mortem template that we use internally.
This template outlines our “sensible default” for documenting any incident, technical or otherwise. We believe it strikes a healthy balance between raw data, human interpretation, and concrete actions. And we say “sensible default” because it’s rare that this will perfectly cover the specific needs of your organization, and that’s fine. Think of this as a launching off point for your own incident post-mortem document.
Within each section, we’ve outlined the background on what it’s for, why it’s important, and how we advise you to complete it.
Seamless critical traffic migration with CoreDNS request rewrite feature
https://engineering.mercari.com/en/blog/entry/20221213-seamless-critical-traffic-migration-with-coredns-request-rewrite-feature
https://engineering.mercari.com/en/blog/entry/20221213-seamless-critical-traffic-migration-with-coredns-request-rewrite-feature
Site Reliability Engineer (SRE) Interview Preparation Guide
This repository is an attempt to consolidate useful resources for Site Reliability Engineer (SRE) interview preparation.https://github.com/mxssl/sre-interview-prep-guide
The life of a DNS query in Kubernetes
https://www.nslookup.io/learning/the-life-of-a-dns-query-in-kubernetes
https://www.nslookup.io/learning/the-life-of-a-dns-query-in-kubernetes
Devopedia
Devopedia is an open community platform for developers by developers to explain technology in a simple, clear and unopinionated way.https://devopedia.org
mox
Mox is a modern full-featured open source secure mail server for low-maintenance self-hosted email.https://github.com/mjl-/mox
Email explained from first principles
This article covers all aspects of modern email.https://explained-from-first-principles.com/email
Taking the fear out of migrations
Over the last 18 months at incident.io, we’ve done a lot of migrations. Often, a new feature requires a change to our existing data model. For us to be successful, it’s important that we can seamlessly transition from the old world to the new as quickly as we can.https://incident.io/blog/how-we-run-migrations
Linkerd at loveholidays
Our journey to a production service mesh - https://tech.loveholidays.com/linkerd-at-loveholidays-our-journey-to-a-production-service-mesh-9a6cd478d395Monitoring our apps using Linkerd metrics - https://tech.loveholidays.com/linkerd-at-loveholidays-monitoring-our-apps-using-linkerd-metrics-fa44c13bee49Understanding Docker's -net=host Option
https://www.metricfire.com/blog/understanding-dockers-net-host-option
https://www.metricfire.com/blog/understanding-dockers-net-host-option