DevOps&SRE Library
19.2K subscribers
430 photos
2 videos
2 files
5.23K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
sish

Open source SSH tunneling for HTTP(S), WS(S), TCP, aliases, and SNI.

If you like the simplicity of serveo/ngrok-style sharing but want to use plain SSH and run your own infrastructure, sish is built for that.


https://github.com/antoniomika/sish
Forwarded from AvitoTech
This media is not supported in your browser
VIEW IN TELEGRAM
Эх, захотелось… Но пока работаем с тем, что есть в SRE-реальности ↖️

Ребята с подкаста «В SREду на кухне» посвятили бюджету ошибок целый выпуск — вместе с Кириллом Борисовым, тимлидом из VK, они обсудили:
🔸 что такое Error budget и можно ли жить без него; 
🔸 как объяснить бизнесу его необходимость;
🔸как его считать;
🔸 почему идеальная надёжность — это опасная иллюзия и миф;
🔸 как метрики помогают упростить расчёт.

Смотрим и слушаем по ссылкам:

📱 YouTube
📱 VK
📱 Rutube

#sre
Please open Telegram to view this post
VIEW IN TELEGRAM
How we built a real-world evaluation platform for autonomous SRE agents at scale

Bits AI SRE is Datadog’s autonomous agent for investigating production incidents. It reasons across metrics, logs, traces, infrastructure metadata, network telemetry, monitor configuration, and more to determine, triage, and remediate the root cause of an issue.


https://www.datadoghq.com/blog/engineering/bits-ai-eval-platform
otel-cardinality-processor

An OpenTelemetry Collector processor that catches metric cardinality explosions before they reach your TSDB.


https://github.com/YElayyat/otel-cardinality-processor
otelite

Lightweight OpenTelemetry receiver and dashboard for local development

Otelite is a single-binary observability tool that receives OpenTelemetry data (logs, traces, metrics) and provides a web dashboard and terminal UI for viewing it. Designed for local LLM development with minimal resource usage (<100MB memory, <5% CPU), it starts in seconds and requires no external dependencies.


https://github.com/planetf1/otelite
goshs

goshs is a single-binary file server built for the moments when you need more than Python's SimpleHTTPServer but don't want to configure Apache. HTTP/S, WebDAV, SFTP, SMB, LDAP/S, basic auth, share links, DNS/SMTP callbacks, NTLM hash capture + cracking — all from one command.


https://github.com/patrickhener/goshs
quarkdown

Quarkdown is a modern Markdown-based typesetting system designed for versatility. It allows a single project to compile seamlessly into a print-ready book, academic paper, knowledge base, or interactive presentation. All through an incredibly powerful Turing-complete extension of Markdown, ensuring your ideas flow automatically into paper.


https://github.com/iamgio/quarkdown
agent-vault

An open-source credential broker by Infisical that sits between your agents and the APIs they call.
Agents should not possess credentials. Agent Vault eliminates credential exfiltration risk with brokered access.


https://github.com/Infisical/agent-vault
When upserts don't update but still write: Debugging Postgres performance at scale

At Datadog, we track the life cycle of millions of ephemeral hosts that report telemetry data to our platform. When a host stops emitting data, we eventually need to clean it up to avoid bloating our metadata store.

To detect inactive hosts, the Datadog team that manages the host metadata store introduced a new upsert to track the last time a host was seen. We expected this new query to have minimal impact. Each host would be updated at most once a day, so even at 25,000 upserts per second, most queries should have been no-ops.

But when we rolled out the new query, disk writes doubled and Write-Ahead Logging (WAL) syncs quadrupled. We discovered that even when an upsert doesn't change any values, it still locks the conflicting row, which is recorded in the WAL. Given that a Postgres cluster can only have a single writer, there's a hard limit to how many writes it can handle. The increase in disk writes introduced by the new query was consuming too much of this limited budget and had to be fixed.

In this post, we'll walk through how we diagnosed the unexpected overhead by inspecting Postgres's WAL and how we rewrote the query to eliminate the cost without sacrificing correctness.


https://www.datadoghq.com/blog/engineering/debugging-postgres-performance
How We Reduced Median Memory Estimation Error by 99%, With the Help of AI

When you're running a system that processes hundreds of thousands of compaction jobs, even small inaccuracies in memory usage estimates compound into real operational pain.


https://mixpanel.substack.com/p/how-we-reduced-median-memory-estimation
honker

honker is a SQLite extension + language bindings that add Postgres-style NOTIFY/LISTEN semantics to SQLite, with built-in durable pub/sub, task queue, and event streams, without client polling or a daemon/broker. Any language that can SELECT load_extension('honker') gets the same features.


https://github.com/russellromney/honker
deepsec

deepsec an agent-powered vulnerability scanner that you can run in your own infrastructure, optimized to perform on-demand review of all code in existing large-scale repos.


https://github.com/vercel-labs/deepsec
PGKeeper: Building the bouncer we needed for Postgres

This is the story of why and how we built PGKeeper, a scalable and reliable service to support Figma’s rapidly growing products and database workload.


https://www.figma.com/blog/pgkeeper-building-the-bouncer-we-needed-for-postgres
boring

A simple command line SSH tunnel manager that just works.


https://github.com/alebeck/boring
The Dao of Terraform Modules: Design and Governance Strategies for High-Quality Terraform Modules

https://lonegunmanb.github.io/dao-of-terraform-modules-book-english
👩‍💻 Всех DevOps-ов и не только с пятницей! Подготовили для вас подборку полезных TUI для Kubernetes, Docker и etcd.

DockMate – ранее DockWatch, легковесный терминал, который позволяет отслеживать производительность в Docker и Podman Compose. Работает как альтернатива LazyDocker на языке Go. С его помощью можно получить полный контроль над работой в контейнерах, доступ к метрикам мониторинга и метаданным панели.

kubetui – интуитивный инструмент для отслеживания ресурсов Kubernetes в реальном времени. Даёт доступ к информации о подах и их логах в контейнере, настроенный мониторинг секретов и ConfigMap, отслеживание событий, упрощает работу с контекстом.

etcd-walker – приложение, которое позволяет работать с etcd хранилищем «ключ-значение» как с системой файлов. Создавайте, удаляйте, экспортируйте ключи и директории через единый интерфейс с поддержкой всех версий etcd, аутентификацией и TLS.

🤩 Кстати, напоминаем о полезном инструменте от нас:

🟡nxs-universal-chart v3.0.7 – модульная платформа для Kubernetes и платформенной поставки приложений, а о том как развернуть полностью готовый Inference контур рассказали на Хабре.

Больше полезных инструментов ищите в DevOps FM!

#девопс #tui #opensource

Реклама. ООО «Никсис»,
ИНН: 5407461244, erid: 2Vtzqx3TURM
Please open Telegram to view this post
VIEW IN TELEGRAM
1
Terraform is dead

The more I look at how we actually build systems now, the more it looks like Terraform is dead.


https://grahamgilbert.com/blog/2026/04/20/terraform-is-dead
🎥 Вебинар: «Организуем CD с помощью Ansible и GitLab CI»

О чём поговорим:
- Как организовать автоматизированный процесс деплоя с помощью GitLab CI и Ansible.
- Как использовать Ansible Playbooks и Roles для управления инфраструктурой.
- Лучшие практики по обновлению сервисов без даунтайма и обработке ошибок.

Что вы получите:
- Вы освоите автоматизацию CD-процессов с Ansible и GitLab CI.
- Сможете разрабатывать гибкие и безопасные пайплайны для деплоя в разных окружениях.
- Поймёте, как уменьшить риск ошибок при развертывании и минимизировать время простоя сервисов.
- Научитесь управлять конфигурациями инфраструктуры без лишней ручной работы.

👉 Для участия зарегистрируйтесь: https://vk.cc/cXAbo1
🎁 Все участники вебинара получат специальные условия на полное обучение курса «DevOps практики и инструменты»

Реклама. ООО «Отус онлайн-образование», ОГРН 1177746618576, www.otus.ru, erid: 2VtzqxMNiF9
waffle

Waffle is a CLI utility that automates AWS Well-Architected Framework Reviews by analyzing Terraform infrastructure using Amazon Bedrock foundation models via direct API invocation in AWS. The Well-Architected Framework questions are then being analyzed by Amazon Bedrock and answered and posted directly to the Well-Architected tool in AWS.


https://github.com/partly-notes/waffle