DevOps&SRE Library

SRE Interview Prep Plan (week 1) The idea of interviewing for an SRE role can seem intimidating. These jobs are highly competitive in the current market and you need to demonstrate skills across a ton of technical areas. This 6-week plan we've put together…

SRE Interview Prep Plan (Week 2)

This week is dedicated to providing you with the skills and knowledge to automate routine tasks, create scripts to solve complex problems, and manage infrastructure as code. As we look at scripting languages like Python and Bash, and explore the various Infrastructure as Code (IaC) platforms like OpenTofu and Ansible, you'll discover how automation forms the backbone of SRE's capability to manage large-scale, reliable services. The upcoming days are going to be a blend of learning, practicing, and grokking the art of automation and scripting.

https://www.codereliant.io/sre-interview-prep-plan-week-2

4.36K views15:01

DevOps&SRE Library

Terraform AWS Provider — Everything you need to know about Multi-Account Authentication and Configuration

https://hector-reyesaleman.medium.com/terraform-aws-provider-everything-you-need-to-know-about-multi-account-authentication-and-f2343a4afd4b

4.12K views07:02

DevOps&SRE Library

Overhauling AWS account access with Terraform, Granted, and GitOps

Duckbill breaks down their method of accessing thousands of client AWS accounts in a way that preserves ease-of-access, maintains data confidentiality, and still providing all the permissions needed.

https://www.duckbillgroup.com/blog/overhauling-aws-account-access-with-terraform-granted-and-gitops

3.87K views15:01

DevOps&SRE Library

Adopt Open ID Connect (OIDC) in Terraform for secure multi-account CI/CD to AWS

https://hedrange.com/2023/10/07/adopt-open-id-connect-oidc-in-terraform-for-secure-multi-account-ci-cd-to-aws

3.58K views07:00

DevOps&SRE Library

terrareg

Open source Terraform Registry.

https://github.com/MatthewJohn/terrareg

3.46K views15:01

DevOps&SRE Library

Sofia’s Observability Odyssey: The Do’s and Don’ts for Effective Observability

https://medium.com/@letathenasleep/alerting-the-dos-and-don-ts-for-effective-observability-139db9fb49d1

3.5K views07:02

DevOps&SRE Library

System Design 101

Explain complex systems using visuals and simple terms.

Whether you're preparing for a System Design Interview or you simply want to understand how systems work beneath the surface, we hope this repository will help you achieve that.

https://github.com/ByteByteGoHq/system-design-101

3.9K views15:02

DevOps&SRE Library

Vulnerability Management at Lyft: Enforcing the Cascade - Part 1

Over the past 2 years, we’ve built a comprehensive vulnerability management program at Lyft. This blog post will focus on the systems we’ve built to address OS and OS-package level vulnerabilities in a timely manner across hundreds of services run on Kubernetes.

https://eng.lyft.com/vulnerability-management-at-lyft-enforcing-the-cascade-part-1-234d1561b994

3.55K views07:01

DevOps&SRE Library

krakend-ce

KrakenD Community Edition: High-performance, stateless, declarative, API Gateway written in Go.

https://github.com/krakend/krakend-ce

3.65K views15:00

DevOps&SRE Library

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot.

https://github.com/TabbyML/tabby

3.86K views07:01

DevOps&SRE Library

Setting up your first EKS cluster on AWS: some practical tips

https://medium.com/@benjamin.christmann_12432/setting-up-your-first-eks-cluster-on-aws-some-practical-tips-60400963c588

3.84K views15:02

DevOps&SRE Library

A Guide to Kubernetes Application Resource Tuning

p1: https://medium.com/@vvsevel/a-guide-to-kubernetes-application-resource-tuning-part-1-bf0ba04db10

p2: https://medium.com/@vvsevel/a-guide-to-kubernetes-application-resource-tuning-part-2-1d287479b52b

p3: https://medium.com/@vvsevel/a-guide-to-kubernetes-application-resource-tuning-part-3-40f7f6510c93

4.18K views07:00

DevOps&SRE Library

AKS Networking Deep Dive: Kubenet vs Azure-CNI vs Azure-CNI (overlay)

https://inder-devops.medium.com/aks-networking-deep-dive-kubenet-vs-azure-cni-vs-azure-cni-overlay-a51709171ce9

4.09K views15:01

DevOps&SRE Library

GitOps using Flux and Flagger

https://dev.to/infracloud/gitops-using-flux-and-flagger-15ci

4.46K views07:01

DevOps&SRE Library

Kubernetes Services: ClusterIP, Nodeport and LoadBalancer

https://sysdig.com/blog/kubernetes-services-clusterip-nodeport-loadbalancer

4.25K views15:01

DevOps&SRE Library

Lessons Learned from Twenty Years of Site Reliability Engineering

Or, Eleven things we have learned as Site Reliability Engineers at Google

1. The riskiness of a mitigation should scale with the severity of the outage
2. Recovery mechanisms should be fully tested before an emergency
3. Canary all changes
4. Have a "Big Red Button"
5. Unit tests alone are not enough - integration testing is also needed
6. COMMUNICATION CHANNELS! AND BACKUP CHANNELS!! AND BACKUPS FOR THOSE BACKUP CHANNELS!!!
7. Intentionally degrade performance modes
8. Test for Disaster resilience
9. Automate your mitigations
10. Reduce the time between rollouts, to decrease the likelihood of the rollout going wrong
11. A single global hardware version is a single point of failure

https://sre.google/resources/practices-and-processes/twenty-years-of-sre-lessons-learned

4.73K views07:02

DevOps&SRE Library

How DoorDash Migrated from StatsD to Prometheus

https://doordash.engineering/2023/08/01/how-doordash-migrated-from-statsd-to-prometheus

4.39K views15:00

DevOps&SRE Library

How to use Terraform test

The new Terraform version v1.6.0 introduce a test framework, named “Terraform test”. Here’s how to use it.

https://blog.captaincy.io/how-to-use-terraform-test

4.08K views07:01

DevOps&SRE Library

Terraform project structure with reusable modules

https://erudinsky.com/2023/10/20/structuring-terraform-projects

4.6K views15:02

DevOps&SRE Library

cluster.dev

Cluster.dev is an open-source tool designed to manage cloud native infrastructures with simple declarative manifests - infrastructure templates. The infrastructure templates could be based on Terraform modules, Kubernetes manifests, Shell scripts, Helm charts, Kustomize and ArgoCD/Flux applications, OPA policies etc. Cluster.dev sticks those components together so that you could deploy, test and distribute a whole set of components with pinned versions.

https://github.com/shalb/cluster.dev

4.51K views07:01

About

Blog

Apps

Platform