DevOps&SRE Library
18.3K subscribers
458 photos
4 videos
2 files
4.95K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
Kubernetes Networking Tutorial: A Guide for Developers

https://www.freecodecamp.org/news/kubernetes-networking-tutorial-for-developers
KubeElasti

Kubernetes-native scale-to-zero with zero traffic loss, no code changes, and direct integration with kubernetes resources


https://github.com/truefoundry/KubeElasti
opentelemetry-operator

The OpenTelemetry Operator is an implementation of a Kubernetes Operator.


https://github.com/open-telemetry/opentelemetry-operator
zarf

Zarf eliminates the complexity of airgap software delivery for Kubernetes clusters and cloud-native workloads using a declarative packaging strategy to support DevSecOps in offline and semi-connected environments.


https://github.com/zarf-dev/zarf
sriov-network-device-plugin

The SR-IOV Network Device Plugin is Kubernetes device plugin for discovering and advertising networking resources in the form of:

- SR-IOV virtual functions (VFs)
- PCI physical functions (PFs)
- Auxiliary network devices, in particular Subfunctions (SFs)

which are available on a Kubernetes host


https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin
Upgrading PostgreSQL with no data loss and minimal downtime

https://palark.com/blog/postgresql-upgrade-no-data-loss-downtime
push-from-k8s-back-to-docker-registry

"Oops, I accidentally deleted my Docker registry. Can I get my images back?" YES. This tool does exactly that.


https://github.com/tazhate/push-from-k8s-back-to-docker-registry
The $1,000 AWS mistake

A cautionary tale about AWS VPC networking, NAT Gateways, and how a missing VPC Endpoint turned our S3 data transfers into an expensive lesson.


https://www.geocod.io/code-and-coordinates/2025-11-18-the-1000-aws-mistake
5 шагов, как выдержать нагрузку в пиковый сезон и не переплатить за инфраструктуру

1️⃣Определите, какую максимальную нагрузку может выдержать ваша инфраструктура и какой рост трафика ожидается во время распродажи.

2️⃣Оптимизируйте сервис и настройте быстрое восстановление из бэкапа.

3️⃣Подготовьте инфраструктуру к масштабированию. В частности, подключите CDN для ускорения загрузки контента.

4️⃣Мониторьте ситуацию во время пиковых нагрузок и следите за корректностью работы всех узлов.

5️⃣После снижения трафика верните систему в штатный режим.

Вы великолепны! А чтобы инфраструктура была еще выгоднее, подключайте СDN от Selectel со скидкой до 50% на дополнительный трафик. Успейте зарегистрироваться и подать заявку до 31 декабря: https://slc.tl/bs9tj

Реклама. АО "Селектел". erid: 2W5zFHM31VS
Postgres Internals Hiding in Plain Sight

Postgres has an awesome amount of data collected in its own internal tables. Postgres hackers know all about this - but software developers and folks working with day to day Postgres tasks often miss out the good stuff.

The Postgres catalog is how Postgres keeps track of itself. Of course, Postgres would do this in a relational database with its own schema. Throughout the years several nice features have been added to the internal tables like psql tools and views that make navigating Postgres’ internal tables even easier.

Today I want to walk through some of the most important Postgres internal data catalog details. What they are, what is in them, and how they might help you understand more about what is happening inside your database.


https://www.crunchydata.com/blog/postgres-internals-hiding-in-plain-sight
Postgres, Kafka and event queues

After stumbling on a pair of interesting blog posts — You Don’t Need Kafka, Just Use Postgres (Considered Harmful) — somewhat in the style of good old “flame wars” (which are increasingly rare these days) in the recent Postgres Weekly, as a response to a previous article — Kafka is fast – I’ll use Postgres — on using Postgres for “Kafkaesque” business, I felt the urge to chip in a bit.

But first off — I’d like to laud the authors of both pieces. They’re well-argued reads with a crazy amount of good tidbits and food for thought. I especially liked that the original one tried to be open and repeatable and actually tested things. Gunnar’s take was maybe a bit too morbid for my taste, of course 🐘

To recap — the main question in the debate was whether Postgres is generally “good enough” to implement a low-to-medium volume event queue or even a pub-sub system. The general sentiment from Hacker News readers at least was that unless scale truly demands Kafka, Postgres is indeed good enough — and many teams plain overestimate their scaling needs and don’t actually need Kafka’s distributed complexity.

Spoiler: there’s obviously no definitive answer as to when one should use a “proper” database for something — and there sure are reasons why we have so many purpose-specific databases: relational, event logs, analytical column stores, key-value, time-series, ledger, graph, hierarchical, document, blob, text search, vector, …

Anyway, below are some thoughts that came to mind — I can’t go too deep on Kafka though, as I’m just not qualified enough.


https://kmoppel.github.io/2025-11-13-postgres-kafka-and-event-queues
🔥 От инженера до эксперта в Observability — начните с вступительного теста! 🚀

Современные системы невозможно контролировать «на ощупь». Хотите понимать, что происходит с инфраструктурой в реальном времени?
Пройдите вступительное тестирование, проверьте уровень и получите скидку на курс «Observability: мониторинг, логирование, трейсинг» от OTUS.

Вы научитесь:
- Строить комплексные системы мониторинга на базе Prometheus, Grafana, ELK и OpenTelemetry
- Настраивать сбор метрик, логов и трейсов в распределённых системах
- Анализировать производительность и быстро находить причины инцидентов
- Проектировать отказоустойчивую observability-инфраструктуру для продакшн-систем

🗓 Старт курса Observability: мониторинг, логирование, трейсинг

Ближайший запуск — 29 декабря, набор в группу завершается.
Формат: онлайн-занятия, работа с реальными кейсами, проект с защитой и персональный фидбек от экспертов.

Чтобы попасть в поток со спецценой, успейте пройти вступительное тестирование до старта. Подробности по скидке уточняйте у менеджера

📌 Сделайте первый шаг сейчас: проверьте свои знания, убедитесь в готовности к обучению и зафиксируйте спеццену на курс.

👉 Пройти вступительное тестирование https://vk.cc/cREJv1

Реклама. ООО «Отус онлайн-образование», ОГРН 1177746618576, erid: 2VtzqvtgcLy
VectorChord

VectorChord (vchord) is a PostgreSQL extension designed for scalable, high-performance, and disk-efficient vector similarity search.


https://github.com/tensorchord/VectorChord
baserow

Baserow is the secure, open-source platform for building databases, applications, automations, and AI agents — all without code. Trusted by over 150,000 users, Baserow delivers enterprise-grade security with GDPR, HIPAA, and SOC 2 Type II compliance, plus cloud and self-hosted deployments for full data control. With a built-in AI Assistant that lets you create databases and workflows using natural language, Baserow empowers teams to structure data, automate processes, build internal tools, and create custom dashboards. Fully extensible and API-first, Baserow integrates seamlessly with your existing tools and performs at any scale.


https://github.com/baserow/baserow
doco-cd

Doco-CD is a lightweight GitOps tool that automatically deploys and updates Docker Compose projects/services and Swarm stacks using polling and webhooks.

You can think of it as a simple Portainer or ArgoCD alternative for Docker.


https://github.com/kimdre/doco-cd
PasswordPusher

Securely share sensitive information with automatic expiration & deletion after a set number of views or duration. Track who, what and when with full audit logs.


https://github.com/pglombardo/PasswordPusher
📡 OpenTelemetry — наблюдаемость на блюдечке

🔥 3 декабря в 20:00 мск — бесплатный вебинар от OTUS.
Современные распределённые системы невозможно контролировать «на ощупь». Нужна прозрачность — и здесь на сцену выходит OpenTelemetry, стандарт де-факто для сбора метрик, логов и трейсов.

Что разберём:
– как устроен OpenTelemetry и почему он стал основой современной Observability;
– какие компоненты входят в стек и как их правильно использовать;
– как собирать и визуализировать данные о работе сервисов;
– как внедрить OpenTelemetry в микросервисную архитектуру и связать с Grafana, Prometheus, Jaeger и Zipkin;
– лучшие практики настройки наблюдаемости и сокращения MTTR (времени восстановления).

Прокачайте наблюдаемость и научитесь ловить сбои ещё до того, как они проявятся.

👉 Регистрируйтесь здесь:
https://vk.cc/cRHnD6

Занятие приурочено к старту курса «Observability: мониторинг, логирование, трейсинг», на котором вы научитесь строить комплексные системы мониторинга, работать с Prometheus, Grafana, ELK, OpenTelemetry и визуализировать метрики так, чтобы никакая ошибка не осталась незамеченной.

Реклама. ООО «Отус онлайн-образование», ОГРН 1177746618576, erid: 2VtzqwmU2gh
postgresus

Free, open source and self-hosted solution for automated PostgreSQL backups. With multiple storage options and notifications


https://github.com/RostislavDugin/postgresus
Migrating from GitHub to Codeberg

Ever since git init ten years ago, Zig has been hosted on GitHub. Unfortunately, when it sold out to Microsoft, the clock started ticking. “Please just give me 5 years before everything goes to shit,” I thought to myself. And here we are, 7 years later, living on borrowed time.

Putting aside GitHub’s relationship with ICE, it’s abundantly clear that the talented folks who used to work on the product have moved on to bigger and better things, with the remaining rookies eager to inflict some kind of bloated, buggy JavaScript framework on us in the name of progress. Stuff that used to be snappy is now sluggish and often entirely broken.

More importantly, Actions is created by monkeys and completely neglected. After the CEO of GitHub said to “embrace AI or get out”, it seems the lackeys at Microsoft took the hint, because GitHub Actions started “vibe-scheduling”; choosing jobs to run seemingly at random. Combined with other bugs and inability to manually intervene, this causes our CI system to get so backed up that not even master branch commits get checked.

Rather than wasting donation money on more CI hardware to work around this crumbling infrastructure, we’ve opted to switch Git hosting providers instead.

As a bonus, we look forward to fewer violations (exhibit A, B, C) of our strict no LLM / no AI policy, which I believe are at least in part due to GitHub aggressively pushing the “file an issue with Copilot” feature in everyone’s face.


https://ziglang.org/news/migrating-from-github-to-codeberg