DevOps&SRE Library
19K subscribers
426 photos
2 videos
2 files
5.16K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
Vibe coding tools observability with VictoriaMetrics Stack and OpenTelemetry

https://victoriametrics.com/blog/vibe-coding-observability
1
taws

taws provides a terminal UI to interact with your AWS resources. The aim of this project is to make it easier to navigate, observe, and manage your AWS infrastructure in the wild.


https://github.com/huseyinbabal/taws
worktrunk

Worktrunk is a CLI for git worktree management, designed for running AI agents in parallel.

Worktrunk's three core commands make worktrees as easy as branches. Plus, Worktrunk has a bunch of quality-of-life features to simplify working with many parallel changes, including hooks to automate local workflows.


https://github.com/max-sixty/worktrunk
snitch

a friendlier ss / netstat for humans. inspect network connections with a clean tui or styled tables.


https://github.com/karol-broda/snitch
IO devices and latency

Here, we're going to cover the history, functionality, and performance of non-volatile storage devices over the history of computing, all using fun and interactive visual elements.


https://planetscale.com/blog/io-devices-and-latency
sshs

Terminal user interface for SSH.


https://github.com/quantumsheep/sshs
SRE Is Anti-Transactional

If you ask 10 SRE engineers to define SRE, you'll get 11 definitions.


https://queue.acm.org/detail.cfm?ref=rss&id=3773094
Resilience vs. Fault tolerance

In this post, I will discuss if there is a difference between resilience and fault tolerance when talking about IT systems.


https://www.ufried.com/blog/resilience_vs_fault_tolerance
Datadog, Thank You for Blocking Us

Datadog cut off our observability overnight. We migrated to an open Grafana stack in 48 hours. Here’s why vendor lock-in is fading in an AI-native world.


https://www.deductive.ai/blogs/datadog-thank-you-for-blocking-us
You Can’t Debug a System by Blaming a Person

“I understand why we need to be blameless, but I have this person in my team who is often reckless. How can I not blame them when their actions continuously make things worse?”

Someone asked me this at the SRE meetup, right after my talk on incidents. Since then I’ve been thinking about it, because it surfaces a concern many people might have.


https://humansinsystems.com/blog/you-cant-debug-a-systems-by-blaming-a-person
Eliminate sensitive values from Terraform state using write-only attributes

https://skundunotes.com/2025/12/22/eliminate-sensitive-values-from-terraform-state-using-write-only-attributes
How We Moved a 2M RPM WebSocket Service to EKS and Fixed a Critical Bottleneck

Lessons in systems because AWS deprecated OpsWorks


https://medium.com/freshworks-engineering-blog/two-million-websockets-90f63e760cfd
Scaling Dagster on Kubernetes: Best Practices for 50+ Code Locations

https://u11d.com/blog/scaling-dagster-kubernetes-multi-code-locations
Investigating and fixing "StopPodSandbox from runtime service failed" Kubelet errors

https://marcusnoble.co.uk/2025-09-28-investigating-and-fixing-stoppodsandbox-from-runtime-service-failed-kubelet-errors
HOWTO: Use SimKube for Cost Forecasting

Recently, I’ve had a number of folks ask for some more details about how SimKube can be used to predict or forecast your Kubernetes expenditures, and I realized that I’ve said you can do this several times, but I’ve never actually gone through the details! So this post will show you how.


https://blog.appliedcomputing.io/p/howto-use-simkube-for-cost-forecasting
kanidm

Kanidm is a simple and secure identity management platform, allowing other applications and services to offload the challenge of authenticating and storing identities to Kanidm.

The goal of this project is to be a complete identity provider, covering the broadest possible set of requirements and integrations. You should not need any other components (like Keycloak) when you use Kanidm - we already have everything you need!

To achieve this we rely heavily on strict defaults, simple configuration, and self-healing components. This allows Kanidm to support small home labs, families, small businesses, and all the way to the largest enterprise needs.

If you want to host your own authentication service, then Kanidm is for you!


https://github.com/kanidm/kanidm
kubernetes-nmstate

Declarative node network configuration driven through Kubernetes API.


https://github.com/nmstate/kubernetes-nmstate
kide

OpenObserve Kide is a lightweight and fast Kubernetes IDE.


https://github.com/openobserve/kide
zot

zot: a production-ready vendor-neutral OCI image registry - images stored in OCI image format, distribution specification on-the-wire, that's it!


https://github.com/project-zot/zot