DevOps&SRE Library
18.8K subscribers
455 photos
3 videos
2 files
5.1K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
The Art of Command Line

Master the command line, in one page


https://github.com/jlevy/the-art-of-command-line
Где вы окажетесь завтра, зависит от того, что вы изучаете сегодня. PostgreSQL — инструмент, который ищут компании, а грамотных специалистов по нему все еще немного.

Почему именно PostgreSQL? Потому что это не просто база данных, а сердце ваших проектов. Если вы администратор БД, разработчик, DevOps или администратор Linux, этот курс — ваш апгрейд.

Мы научим настраивать кластеры, оптимизировать производительность, разбираться с блокировками и решать задачи работы с большими объемами данных. А также живые лекции, практические задания и диплом, который признают лидеры рынка. Учитесь у практиков, которые знают, как решать реальные задачи, и получите навыки, за которые платят топовые компании.

Присоединяйтесь к курсу «PostgreSQL для администраторов баз данных и разработчиков» сейчас и начните свой путь к высокооплачиваемой карьере!

Оставить заявку: https://vk.cc/cU48kH

🎁Бонус: скидка 5% на обучение по промокоду go_dba. Предложение действительно до 12.02

Реклама. ООО «Отус онлайн-образование», ОГРН 1177746618576, erid: 2VtzqvsJ3fH
The only Terraform pipeline you will ever need: GitHub Actions for Multi-Environment Deployments

https://medium.com/zencore/the-only-terraform-pipeline-you-will-ever-need-github-actions-for-multi-environment-deployments-a2cb25d72473
Managing Terraform at Scale: A Deep Dive into Terragrunt Configuration Hierarchy

How I manage 100+ infrastructure components across multiple products, environments, and regions without configuration duplication


https://devineer.medium.com/managing-terraform-at-scale-a-deep-dive-into-terragrunt-configuration-hierarchy-54f1f16e7c1f
Futureproofing Tines: Partitioning a 17TB table in PostgreSQL

At Tines, we recently faced a significant engineering challenge: our output_payloads table in PostgreSQL was rapidly approaching 17TB on our largest cloud cluster, with no signs of slowing down. Once a table reaches PostgreSQL’s 32TB table size limit, it will stop accepting writes. This table holds event data, in the form of arbitrary JSON, which is critical to powering Tines workflows. Given the criticality of the data, we couldn’t risk any disruptions to it.

As our monitoring showed the table's growth, we began experiencing warning signs. Cleanup jobs on the table had begun to time out. The table was causing increased I/O pressure on our infrastructure, leading us to use more expensive hardware. The arbitrary JSON shape of the data meant massive autovacuum jobs on its TOAST table. When these autovacuums ran, they displaced other tables from the buffer cache, forcing disk reads in critical areas. As a bandaid, we modified the autovacuum parameters of the table so that the autovacuums would run more frequently, but have less tuples to process. With performance slowly degrading, and 32TB looming on the horizon, we knew we needed to act decisively.


https://www.tines.com/blog/futureproofing-tines-partitioning-a-17tb-table-in-postgresql
Chasing Boring at Just the Right Speed

https://log.andvari.net/no-mttr.html
fluid.sh

AI agents are ready to do infrastructure work, but they can't touch prod:

- Agents can install packages, configure services, write scripts—autonomously
- But one mistake on production and you're getting paged at 3 AM to fix it
- So we limit agents to chatbots instead of letting them do the work


https://github.com/aspectrr/fluid.sh
Это ваше приглашение на Deckhouse User Community meetup #4

Кому: инженерам, которые работают с Kubernetes
Когда: 26 февраля
Где: Москва, офлайн

На митапе узнаете о запуске Kubernetes поверх любых операционных систем, о реальном опыте эксплуатации платформы в одиночку, о домашней виртуализации на бюджетном железе и о практичном подходе к безопасности.

Киллер-фича события — интерактивная зона «Попробуй сам» с развёрнутым кластером Deckhouse Kubernetes Platform Community Edition. Протестируйте платформу своими руками, а инженеры Deckhouse помогут разобраться.

Регистрация
graft

Graft is a CLI tool that brings the Overlay Pattern (similar to Kustomize) to Terraform. It acts as a JIT (Just-In-Time) Compiler, allowing you to apply declarative patches to third-party modules at build time.

With Graft, you can treat upstream modules (e.g., from the Public Registry) as immutable base layers and inject your own logic on top—without the maintenance nightmare of forking.


https://github.com/ms-henglu/graft
Owning a $5M data center

These days it seems you need a trillion fake dollars, or lunch with politicians to get your own data center. They may help, but they’re not required. At comma we’ve been running our own data center for years. All of our model training, metrics, and data live in our own data center in our own office. Having your own data center is cool, and in this blog post I will describe how ours works, so you can be inspired to have your own data center too.


https://blog.comma.ai/datacenter
Kubernetes в продакшене: от CI/CD до безопасности и отказоустойчивости

👩‍💻 Курс по Kubernetes: автоматизируйте инфраструктуру и подготовьтесь к CKA/CKAD

Пройдите тест и забронируйте место на курсе от OTUS. А так же и получите скидку 🎁 до 15.02.2026 - подробности у менеджера.

ПРОЙТИ ТЕСТ: https://vk.cc/cUeS0D

Курс «Инфраструктурная платформа на основе Kubernetes» научит проектировать и запускать платформы для цифровых продуктов: IaC, механизмы K8s, экосистему инструментов и эксплуатацию кластеров. Программа от Express 42 ориентирована на практику и подходит техлидам, архитекторам ПО, разработчикам, DevOps и администраторам.

🎁 Бонус — курс в записи на выбор:
- Elastic/OpenSearch Advanced
- Углубленное изучение языка Java
- GitOps

Реклама. ООО «Отус онлайн-образование», ОГРН 1177746618576, erid: 2VtzqwTHk9Z
Please open Telegram to view this post
VIEW IN TELEGRAM
whosthere

Local Area Network discovery tool with a modern Terminal User Interface (TUI) written in Go. Discover, explore, and understand your LAN in an intuitive way.

Whosthere performs unprivileged, concurrent scans using mDNS and SSDP scanners. Additionally, it sweeps the local subnet by attempting TCP/UDP connections to trigger ARP resolution, then reads the ARP cache to identify devices on your Local Area Network. This technique populates the ARP cache without requiring elevated privileges. All discovered devices are enhanced with OUI lookups to display manufacturers when available.

Whosthere provides a friendly, intuitive way to answer the question every network administrator asks: "Who's there on my network?"


https://github.com/ramonvermeulen/whosthere
zerobrew

zerobrew applies uv's model to Mac packages. Packages live in a content-addressable store (by sha256), so reinstalls are instant. Downloads, extraction, and linking run in parallel with aggressive HTTP caching. It pulls from Homebrew's CDN, so you can swap brew for zb with your existing commands.

This leads to dramatic speedups, up to 5x cold and 20x warm.


https://github.com/lucasgelfond/zerobrew
Hi! My good friend is looking for a colleague to join their team.

You can check the details of the position and apply here: https://jobs.ashbyhq.com/perplexity/7bce0fcf-eef6-41aa-9243-896f07a0316e

If you have additional questions about the position, you can send them to alena@perplexity.ai.
prek

pre-commit is a framework to run hooks written in many languages, and it manages the language toolchain and dependencies for running the hooks.


https://github.com/j178/prek
The future of software engineering is SRE

When code gets cheap operational excellence wins. Anyone can build a greenfield demo, but it takes engineering to run a service.


https://swizec.com/blog/the-future-of-software-engineering-is-sre
10 Elasticsearch Production Issues (and How Postgres Avoids Them)

Elasticsearch may work great in initial testing and development but Production is a different story. This blog is about what happens after you ship: the JVM tuning, the shard math, the 3 AM pages, the sync pipelines that break silently. The stuff your ops team lives with.

After years of teams running Elasticsearch in production, certain patterns keep emerging. The same issues show up in blog posts, Stack Overflow questions, and incident reports. We've compiled ten of the most common ones below, with references to the engineers who've documented them. We’ve also added images to make it easy to quickly skim through it and compare the challenges against Postgres.

TLDR: With great power comes great operational complexity.


https://www.tigerdata.com/blog/10-elasticsearch-production-issues-how-postgres-avoids-them
How OpenAI Scales Postgres to Power 800 Million ChatGPT Users

For years, PostgreSQL has been one of the most critical, under-the-hood data systems powering core products like ChatGPT and OpenAI’s API. As our user base grows rapidly, the demands on our databases have increased exponentially, too. Over the past year, our PostgreSQL load has grown by more than 10x, and it continues to rise quickly.


https://openai.com/index/scaling-postgresql
Introduction to Buffers in PostgreSQL

The work around RegreSQL led me to focus a lot on buffers. If you are a casual PostgreSQL user, you have probably heard about adjusting shared_buffers and followed the good old advice to set it to 1/4 of available RAM. But after we went a little bit too enthusiastic about them on a recent Postgres FM episode I've been asked what that's all about.

Buffers are one of those topics that easily gets forgotten. And while they are a foundation block of PostgreSQL's performance architecture, most of us treat them as a black box. This article is going to attempt to change that.


https://boringsql.com/posts/introduction-to-buffers