DevOps&SRE Library

etcd: getting 30% more write/s

My team at Zendesk look after around 30 Kubernetes clusters. These are all self managed, meaning we maintain the API servers, and as you may guess: etcd.

Recently, I had a task to do some performance analysis on our etcd clusters. It had been a while since we ran any sort of benchmarking. Plus I wanted to get my hands dirty as I haven’t got much experience in tuning databases.

While I ended up getting about a 30% increase in performance, I learnt a lot about how databases, and by extension; how disks work together.

https://zendesk.engineering/etcd-getting-30-more-write-s-318bcdbf7774

3.21K views16:00

DevOps&SRE Library

How to Monitor CoreDNS

CoreDNS is a DNS add-on for Kubernetes environments. It is one of the components running in the control plane nodes, and having it fully operational and responsive is key for the proper functioning of Kubernetes clusters. Learning how to monitor CoreDNS, and what its most important metrics are, is a must for operations teams.

https://sysdig.com/blog/how-to-monitor-coredns

3.16K views07:00

DevOps&SRE Library

Managing Grafana Dashboards With Terraform

We’ve all done it — deleted a graph from a dashboard, realised we still need it but have forgotten the query. Use Terraform to go back in time and save yourself the headache

https://betterprogramming.pub/managing-grafana-dashboards-with-terraform-ad49ff6bb552

3.11K views16:00

DevOps&SRE Library

The DevOps Hangover

The greatest irony is that DevOps aimed to help developers talk with operations, and vice versa, yet developers are even further away from understanding how their code operates at runtime.

https://www.linkedin.com/pulse/devops-hangover-pete-cheslock

3.12K views07:01

DevOps&SRE Library

Whose Cert Is It Anyway?

https://www.netmeister.org/blog/caa-diversity.html

3.16K views16:01

DevOps&SRE Library

Migrating Terraform state from Terraform Cloud to S3

https://blog.marcolancini.it/2023/blog-migrate-terraform-state-from-terraform-cloud-to-s3

3.01K views07:00

DevOps&SRE Library

Moving Terraform Managed Resources Between States for Scaling AWS Infrastructure in Startups

https://fivexl.io/blog/terraform-mv

2.94K views16:01

DevOps&SRE Library

Terraform check{} Block

https://unfriendlygrinch.info/posts/terraform-check-block

3.04K views07:01

DevOps&SRE Library

Why `fsync()`: Losing unsynced data on a single node leads to global data loss

Regardless of the replication mechanism you must fsync() your data to prevent global data loss in non-Byzantine protocols.

https://redpanda.com/blog/why-fsync-is-needed-for-data-safety-in-kafka-or-non-byzantine-protocols

2.96K views16:00

DevOps&SRE Library

Fleet Management at Spotify

Part 1: Spotify’s Shift to a Fleet-First Mindset - https://engineering.atspotify.com/2023/04/spotifys-shift-to-a-fleet-first-mindset-part-1

Part 2: The Path to Declarative Infrastructure - https://engineering.atspotify.com/2023/05/fleet-management-at-spotify-part-2-the-path-to-declarative-infrastructure

Part 3: Fleet-wide Refactoring - https://engineering.atspotify.com/2023/05/fleet-management-at-spotify-part-3-fleet-wide-refactoring

3.36K views07:00

DevOps&SRE Library

SSH-powered pastebin with a human-friendly TUI and web UI

snips.sh is a free, anonymous, open source, snippet hosting service

https://snips.sh

3.24K views16:02

DevOps&SRE Library

Learn why you can't ping a Kubernetes service

TL;DR: in this article, you will learn how ClusterIP services and kube-proxy work in Kubernetes.

https://dev.to/danielepolencic/learn-why-you-cant-ping-a-kubernetes-service-3nlm

3.32K views07:00

DevOps&SRE Library

Manage multiple Terraform projects in monorepo

https://janik6n.net/posts/manage-multiple-terraform-projects-in-monorepo

3.36K views16:02

DevOps&SRE Library

Using Terragrunt run-all in AWS CI/CD

https://levelup.gitconnected.com/using-terragrunt-run-all-in-aws-ci-cd-df2877bad198

3.14K views07:00

DevOps&SRE Library

The Must-Know Terraform Interview Questions

https://devopsknowledge.hashnode.dev/the-must-know-terraform-interview-questions

4.43K views16:01

DevOps&SRE Library

Monitoring a Rust Web Application Using Prometheus and Grafana

https://betterprogramming.pub/monitoring-a-rust-web-application-using-prometheus-and-grafana-3c75d9435dec

3.06K views07:02

DevOps&SRE Library

USE vs RED vs The Four Golden Signals

https://faun.pub/use-vs-red-vs-the-four-golden-signals-50655e93fad7

4.28K views16:00

DevOps&SRE Library

Prometheus Alertmanager best practices

https://dev.to/sysdig/prometheus-alertmanager-best-practices-4872

3.76K views07:01

DevOps&SRE Library

Grafana Mimir — our journey towards infinite wisdom with 5m active time series

https://tech.loveholidays.com/grafana-mimir-our-journey-towards-infinite-wisdom-with-5m-active-time-series-7a262ba53a3f

3.24K views16:01

DevOps&SRE Library

6 Best Practices for Effective Monitoring Alerts

https://medium.com/@bregman.arie/6-best-practices-for-effective-monitoring-alerts-a585bfc0d830

3.35K views07:00

About

Blog

Apps

Platform