DevOps&SRE Library
19.4K subscribers
425 photos
2 videos
2 files
5.29K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
Kubernetes egress control with squid proxy

Shows how to enforce and observe Kubernetes egress traffic with Squid plus NetworkPolicy without adding a service mesh.


https://interlaye.red/kubernetes_002degress_002dsquid.html
How We Turned a Forced OS Migration into a 30% Infrastructure Reduction

Scout24 used an Amazon Linux 2 migration window to adopt Karpenter and cut EKS node count by about 30%.


https://scout24.medium.com/infinity-transformation-how-we-turned-a-forced-os-migration-into-a-30-infrastructure-reduction-1a41237307b8
Auto-scaling and Load-based Scaling

Explains reactive metric-based scaling versus scheduled scaling and where each approach fits.


https://blog.felipefr.dev/auto-scaling-and-load-based-scaling
rtk

CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies


https://github.com/rtk-ai/rtk
Integration testing with Kubernetes

Shows a Rust-based integration testing workflow on kind with Terraform and cleanup policies for parallel runs.


https://mikamu.substack.com/p/integration-testing-with-kubernetes
Vault: secure Kubernetes authentication with hashicorp Vault OIDC

Explains how to use Vault as an OIDC provider to replace static kubeconfig credentials with short-lived tokens.


https://phuchoang.sbs/posts/gitops-kubernetes-oidc-vault
Security Inside Kubernetes: Admission & Runtime Guardrails with Kyverno and KubeArmor

Covers layered Kubernetes security by combining Kyverno admission policies with KubeArmor runtime enforcement.


https://medium.com/globant/security-inside-kubernetes-admission-runtime-guardrails-with-kyverno-and-kubearmor-6d2f97264cbc
Crust-Gather - kubectl Cluster Snapshot Plugin

Open-source kubectl plugin for collecting a structured cluster snapshot for debugging and analysis.


https://github.com/crust-gather/crust-gather
Kogaro - Kubernetes Configuration Hygiene Agent

Agent project focused on improving Kubernetes configuration hygiene and reducing misconfiguration risk.


https://github.com/topiaruss/kogaro
llm-d: SOTA inference performance

Project targeting high-performance large language model inference workloads.


https://github.com/llm-d/llm-d
Kthena: Enterprise LLM serving

Enterprise-oriented platform for serving and operating LLM workloads on Kubernetes.


https://github.com/volcano-sh/kthena
Easykube: Local Kubernetes development

Tooling aimed at simplifying local Kubernetes development environments.


https://github.com/torloejborg/easykube
Guardon: Kubernetes security extension

Security-focused extension project for strengthening Kubernetes environments.


https://github.com/guardon-dev/guardon
difftastic

a structural diff that understands syntax


http://github.com/Wilfred/difftastic
We Cut Our Kubernetes Pods by 60% and Doubled Traffic Capacity

This case study explains how JVM tuning, a smaller Hikari pool, and faster HPA scale-up doubled traffic capacity while reducing baseline pods.


https://medium.com/@feridquluzade2002/we-cut-our-kubernetes-pods-by-60-and-doubled-traffic-capacity-b1cfb6850fca
Hidden Kubernetes Bad Practices Learned the Hard Way During Incidents

This article distills incident-driven lessons on troubleshooting, configuration mistakes, and operational habits that make Kubernetes outages worse.


https://hackernoon.com/hidden-kubernetes-bad-practices-learned-the-hard-way-during-incidents
From Chaos to 99.9% Uptime: Rebuilding a Kubernetes Platform for GPU Workloads

This article covers rebuilding a Kubernetes platform for GPU workloads to reach 99.9% uptime after operational instability.


https://medium.com/@mateenali66/from-chaos-to-99-9-uptime-rebuilding-a-kubernetes-platform-for-gpu-workloads-4fadb1067a0b
Benchmarking Kubernetes Log Collectors: vlagent, Vector, Fluent Bit, OpenTelemetry Collector, and more

At VictoriaMetrics, we built vlagent as a high-performance log collector for VictoriaLogs. To validate its performance and correctness under a real production-like load, we developed a benchmark suite and ran it against 8 popular log collectors. This post covers the methodology, throughput results, resource usage, and delivery correctness.


https://victoriametrics.com/blog/log-collectors-benchmark-2026/index.html
Making and scaling a game server in Kubernetes using agones

This tutorial walks through building a Go game server with Agones, matchmaking, Fleet allocation, and autoscaling on Kubernetes.


https://noe-t.dev/posts/making-and-scaling-a-game-server-in-k8s-using-agones
PostgreSQL migration with CloudNativePG Logical Replication on Kubernetes - Zero-Downtime

This tutorial shows how to migrate PostgreSQL to CloudNativePG on Kubernetes with logical replication and no downtime.


https://kndoni.medium.com/postgresql-migration-with-cloudnativepg-logical-replication-on-kubernetes-zero-downtime-aef1c33a3a53