DevOps&SRE Library
19.4K subscribers
425 photos
2 videos
2 files
5.29K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
EKS Auto Mode: Simplify Kubernetes with Terraform Setup

Instead of managing node groups, installing Karpenter, configuring the VPC CNI plugin, deploying the AWS Load Balancer Controller, setting up the EBS CSI driver, and keeping all of those components updated and compatible with each other - you enable a single flag and AWS handles all of it.


https://darryl-ruggles.cloud/a-complete-terraform-setup-for-eks-auto-mode-is-it-right-for-you
Terraform Tips from the IaC Trenches

After a few years of writing open-source Terraform modules, I've picked up a few syntax tricks that make code safer, cleaner, and easier to maintain.


https://rosesecurity.dev/2025/12/04/terraform-tips-and-tricks.html
3
How We Manage Domain and DNS Management with Infrastructure as Code

After successfully adopting Terraform for GitHub repository management, the next step in our Infrastructure as Code (IaC) journey was clear: dogfood our own product and manage our domains and DNS zones using the DNSimple Terraform provider.


https://blog.dnsimple.com/2025/11/managing-domains-terraform-dnsimple
DriftHound

DriftHound is a Rails WebApp that receives Terraform drift reports via API and provides visibility into infrastructure drift across your projects.


https://github.com/drifthoundhq/drifthound
otel-front

A lightweight, single-binary OpenTelemetry viewer for local development. Visualize traces, logs, and metrics from your instrumented applications — no Docker, no databases, no complex setup.


https://github.com/mesaglio/otel-front
scion

Run multiple agents in parallel — each in its own container, with its own workspace, collaborating on your code or project files simultaneously.


https://github.com/GoogleCloudPlatform/scion
atomic

A personal knowledge base that turns markdown notes into a semantically-connected, AI-augmented knowledge graph.

Atomic stores knowledge as atoms — markdown notes that are automatically chunked, embedded, tagged, and linked by semantic similarity. Your atoms can be synthesized into wiki articles, explored on a spatial canvas, and queried through an agentic chat interface.


https://github.com/kenforthewin/atomic
Inside a Self-Hosted AI Coding Assistant: Architecture, Kubernetes Deployment, and llama.cpp Parallelism

More and more enterprises want the benefits of AI-assisted coding, automatic completions, suggestions, and inline generation, without sending their source code to external APIs.

This has naturally increased interest in self-hosted coding assistants, where all inference runs on internal hardware and all models stay inside a controlled environment.

We built a complete prototype of such a system. In this article, we walk through its architecture, explain how Kubernetes is used to deploy it, and how different system parameters interact to determine real-world performance. In a separate post, we study how the llama.cpp inference server behaves under increasing load.


https://medium.com/@ferraricorneloup.teo/inside-a-self-hosted-ai-coding-assistant-architecture-kubernetes-deployment-and-llama-cpp-158330a12441
How My Client Hit Linux Kernel Network Limits on AWS EKS

This is a story about a tricky issue I resolved recently.


https://dev.to/datton94/how-my-client-hit-linux-kernel-network-limits-on-aws-eks-3am5
Startup CPU Boost in Kubernetes with In-Place Pod Resize

This article explains how to use the In-Place Pod Resize feature in Kubernetes, combined with Kube Startup CPU Boost, to speed up Java application startup.


https://piotrminkowski.com/2025/12/22/startup-cpu-boost-in-kubernetes-with-in-place-pod-resize/
dynamo

The open-source, datacenter-scale inference stack. Dynamo is the orchestration layer above inference engines — it doesn't replace SGLang, TensorRT-LLM, or vLLM, it turns them into a coordinated multi-node inference system. Disaggregated serving, intelligent routing, multi-tier KV caching, and automatic scaling work together to maximize throughput and minimize latency for LLM, reasoning, multimodal, and video generation workloads.


https://github.com/ai-dynamo/dynamo
helm-exporter

Exports helm release, chart, and version statistics in the prometheus format.


https://github.com/sstarcher/helm-exporter
Inside Adobe's OpenTelemetry pipeline: simplicity at scale

As part of an ongoing series, the Developer Experience SIG interviews organizations about their real-world OpenTelemetry Collector deployments to share practical lessons with the broader community. This post features Adobe, a global software company whose observability team has built an OpenTelemetry-based telemetry pipeline designed for simplicity at massive scale, with thousands of collectors running per signal type across the company’s infrastructure.


https://opentelemetry.io/blog/2026/devex-adobe
traceway

Traceway is a self-hosted observability platform that ingests OpenTelemetry traces and metrics, groups exceptions automatically, and gives you endpoint performance, distributed tracing, and alerts — all in a single binary. No OTel Collector or separate time-series database required.


https://github.com/tracewayapp/traceway
lynxdb

Log analytics in a single binary. No dependencies. Lynx Flow query language.


https://github.com/lynxbase/lynxdb
cardamon

Cardamon is a metric auditor for Prometheus. It identifies metrics that exist in your TSDB but are never actually queried by dashboards, alerting rules, recording rules, or any other consumer. You can then generate Prometheus drop rules to remove them and reduce storage need.


https://github.com/dominikhei/cardamon
versitygw

Versity Gateway, a simple to use tool for seamless inline translation between AWS S3 object commands and storage systems. The Versity Gateway bridges the gap between S3-reliant applications and other storage systems, enabling enhanced compatibility and integration while offering exceptional scalability.


https://github.com/versity/versitygw
Terragrunt 1.0 Released!

After nearly a decade of development, over 900 releases, and tens of millions of infrastructure deployments by platform teams, today we're happy to announce that Terragrunt 1.0 is officially here.


https://www.gruntwork.io/blog/terragrunt-1-0-released
Little Snitch for Linux

Every time an application on your computer opens a network connection, it does so quietly, without asking. Little Snitch for Linux makes that activity visible and gives you the option to do something about it. You can see exactly which applications are talking to which servers, block the ones you didn't invite, and keep an eye on traffic history and data volumes over time.


https://obdev.at/products/littlesnitch-linux/index.html