Practical Considerations for AI Incident Reviews
https://fgj.codes/posts/ai-incident-reviews
The post argues AI-written incident reviews fail without rich cross-system data and human engagement because incident reviews are socio-technical learning work, not just document generation.
https://fgj.codes/posts/ai-incident-reviews
Вебинар: Механизмы защиты от переполнения диска в Databases
Что делать, если диск переполнен? Экстренно очистить кэш пакетного менеджера или удалить старые логи, но важно другое. А как вообще не допускать таких ситуаций? Как построить систему хранения данных, чтобы не переплачивать за автомасштабирование, но обезопасить себя от простоев? Приходите на наш вебинар, чтобы узнать узнать больше про работу с дисками и поучаствовать в обсуждении актуальных кейсов.
16 апреля, 16:00 (мск)
О чем будем говорить
- Зачем нужны WAL, и что будет если их удалить
- Какие существуют неочевидные причины переполнения дисков, как это влияет на доступность БД
- Сколько стоят последствия даунтаймов. Какие есть инструменты, чтобы все исправить
- Чек-лист инструментов, чтобы избежать переполнения WAL-диска: мониторинг, алертинг и профилирование нагрузки
Кому будет полезен вебинар
- DevOps/SRE-инженерам
- DBA и инженерам сопровождения БД
- архитекторам облачных решений
- техническим лидам, отвечающим за стабильность сервисов
Зарегистрироваться
Что делать, если диск переполнен? Экстренно очистить кэш пакетного менеджера или удалить старые логи, но важно другое. А как вообще не допускать таких ситуаций? Как построить систему хранения данных, чтобы не переплачивать за автомасштабирование, но обезопасить себя от простоев? Приходите на наш вебинар, чтобы узнать узнать больше про работу с дисками и поучаствовать в обсуждении актуальных кейсов.
16 апреля, 16:00 (мск)
О чем будем говорить
- Зачем нужны WAL, и что будет если их удалить
- Какие существуют неочевидные причины переполнения дисков, как это влияет на доступность БД
- Сколько стоят последствия даунтаймов. Какие есть инструменты, чтобы все исправить
- Чек-лист инструментов, чтобы избежать переполнения WAL-диска: мониторинг, алертинг и профилирование нагрузки
Кому будет полезен вебинар
- DevOps/SRE-инженерам
- DBA и инженерам сопровождения БД
- архитекторам облачных решений
- техническим лидам, отвечающим за стабильность сервисов
Зарегистрироваться
10 Real-World Status Page Examples: And What You Can Learn From Them
https://uptimerobot.com/blog/10-real-status-page-examples
The post walks through ten status page examples and highlights clear communication, simple layouts, and expectation-setting details that help users during incidents.
https://uptimerobot.com/blog/10-real-status-page-examples
Disappointing People Early
https://log.andvari.net/disappointing-people-early.html
The post argues teams should make reliability targets, support limits, and roadmap uncertainty explicit early so customers and stakeholders do not build riskier implicit expectations.
https://log.andvari.net/disappointing-people-early.html
5 Suggestions to Upgrade your OpenTofu/Terraform & AWS Development Experience
https://www.uturndata.com/insights/5-suggestions-upgrade-opentofu-terraform-aws-development-experience
Five practical DX improvements for daily OpenTofu/Terraform + AWS work: use `tenv` for seamless version switching, a `grep` alias to summarize plans quickly, `tflint` with cloud provider plugins for linting, `awsp` for fast AWS profile switching, and a customized shell prompt showing the current branch/workspace/profile at a glance to prevent costly wrong-context mistakes.
https://www.uturndata.com/insights/5-suggestions-upgrade-opentofu-terraform-aws-development-experience
Terraform Drift Detection Powered by GitHub Actions
https://rosesecurity.dev/2025/12/11/terraform-drift-detection-with-github-actions.html
A zero-cost drift detection pipeline built entirely on GitHub Actions uses Terraform's native `-detailed-exitcode` flag to auto-discover root modules, run daily parallel plans, and open GitHub Issues when drift is detected — no external tools or paid services required, with OIDC for keyless AWS auth.
https://rosesecurity.dev/2025/12/11/terraform-drift-detection-with-github-actions.html
InfraKitchen
https://opensource.electrolux.one/infrakitchen
An open-source platform from Electrolux that lets platform teams define reusable Terraform templates while enabling developers to self-serve multi-cloud infrastructure (AWS, Azure, GCP) via pull-request-driven continuous delivery, with audit logging and an MCP server for AI agent integration.
https://opensource.electrolux.one/infrakitchen
nono
https://github.com/always-further/nono
AI agents get filesystem access, run shell commands, and are wide open to prompt injections. The standard response is guardrails and policies. The problem is that policies can be bypassed — and guardrails can be talked out of.
With nono, you don't have to. nono wraps your agent in a kernel-isolated sandbox in seconds — with API key protection, destructive action guardrails, and full snapshot/rollback built in. No hypervisor to configure. No container volume mounts, instead fine grained capability control to the file level. Zero latency overhead.
https://github.com/always-further/nono
nanobrew
https://github.com/justrach/nanobrew
A fast package manager for macOS and Linux. Written in Zig. Uses Homebrew's bottles and formulas under the hood, plus native .deb support for Docker containers.
https://github.com/justrach/nanobrew
Automating RDS Postgres to Aurora Postgres Migration
https://netflixtechblog.com/automating-rds-postgres-to-aurora-postgres-migration-261ca045447f
In 2024, the Online Data Stores team at Netflix conducted a comprehensive review of the relational database technologies used across the company. This evaluation examined functionality, performance, and total cost of ownership across our database ecosystem. Based on this analysis, we decided to standardize on Amazon Aurora PostgreSQL as the primary relational database offering for Netflix teams.
https://netflixtechblog.com/automating-rds-postgres-to-aurora-postgres-migration-261ca045447f
Safeguarding dynamic configuration changes at scale
https://medium.com/airbnb-engineering/safeguarding-dynamic-configuration-changes-at-scale-5aca5222ed68
How Airbnb ships dynamic config changes safely and reliably.
https://medium.com/airbnb-engineering/safeguarding-dynamic-configuration-changes-at-scale-5aca5222ed68
How to cut your Docker build time by 95%, Buildx, Caching & Layer Optimization
https://arcnet.am/post/70
Docker builds taking forever? I cut mine from 8 min to 24 sec. Here's how using Buildx and caching.
https://arcnet.am/post/70
Terraform Parallelism: How It Works, Tuning & Best Practices
https://spacelift.io/blog/terraform-parallelism
In this blog post, we will explore Terraform parallelism: what it is, how to manage it, and best practices for configuring parallelism in Terraform.
https://spacelift.io/blog/terraform-parallelism
До сих пор разворачиваете PostgreSQL вручную?
Сэкономьте силы для задач разработки.
21 апреля в 16:00 (мск) пройдёт вебинар от MWS Cloud Platform, где эксперты компании расскажут, как получить готовую базу для бэкенда за несколько минут.
Что будет в эфире:
⚫ облачный PostgreSQL: плюсы/минусы решения;
⚫ как устроен управляемый сервис в новом облаке от MWS Cloud;
⚫ машинерия под капотом бэкапов, автообновлений, switch и failover;
⚫ создадим кластер за несколько минут и настроим подключение.
Вебинар будет интересен администраторам баз данных (DBA), бэкенд-разработчикам, DevOps- и SRE-инженерам, техническим лидам и архитекторам, владельцам продуктов и стартапам.
Зарегистрироваться
Сэкономьте силы для задач разработки.
21 апреля в 16:00 (мск) пройдёт вебинар от MWS Cloud Platform, где эксперты компании расскажут, как получить готовую базу для бэкенда за несколько минут.
Что будет в эфире:
Вебинар будет интересен администраторам баз данных (DBA), бэкенд-разработчикам, DevOps- и SRE-инженерам, техническим лидам и архитекторам, владельцам продуктов и стартапам.
Зарегистрироваться
Please open Telegram to view this post
VIEW IN TELEGRAM
4 ways to use Argo CD and Terraform together
https://octopus.com/blog/argocd-terraform-together
Terraform is the most popular solution for implementing Infrastructure As Code (IaC). The Terraform provider registry contains a very large collection of providers/integrations for all the major cloud providers and at the same time offers a wealth of integration for databases, networking components, Continuous Integration platforms etc.
Argo CD is the leading solution for GitOps deployments on Kubernetes. In the last CNCF survey we found out that 60% of respondents use Argo CD in production.
Although several guides currently exist that explain how to use each tool individually, there is limited information on how they can be combined. A lot of existing Terraform users adopt Argo CD and wonder:
1. What is the best way to pass variables from Terraform to Helm charts deployed with Terraform?
2. How to get secrets in Kubernetes applications that are generated/retrieved from Terraform?
3. When should the Terraform Helm and Kubernetes providers come into play if Argo CD already supports Kubernetes deployments on its own?
4. For which Kubernetes resources should Terraform be responsible and for which Argo CD?
5. What is the proper boundary between the two tools so that operators can use them to the maximum benefit?
In this guide, we will answer all these questions and actually show you four different approaches for how Terraform and Argo CD can work together. Note that everything we say about Terraform also applies to OpenTofu.
https://octopus.com/blog/argocd-terraform-together
Migrating Etsy's database sharding to Vitess
https://www.etsy.com/codeascraft/migrating-etsyas-database-sharding-to-vitess
This database cluster contains most of Etsy's online data and is made up of ~1,000 tables distributed across ~1,000 shards.
https://www.etsy.com/codeascraft/migrating-etsyas-database-sharding-to-vitess
We Automated Everything Except Knowing What's Going On
https://eversole.dev/blog/we-automated-everything
AI collapsed the cost of building software, but the systems underneath are buckling.
https://eversole.dev/blog/we-automated-everything
Why our Kafka consumers survived the day but died every night
https://medium.com/@lokeshsoni/why-our-kafka-consumers-survived-the-day-but-died-every-night-8c9eb6ae528f
It took us 4–5 incidents over several weeks to even recognise the pattern.
https://medium.com/@lokeshsoni/why-our-kafka-consumers-survived-the-day-but-died-every-night-8c9eb6ae528f
Reliability Engineering for Air-Gapped Systems
https://blog.alexewerlof.com/p/reliability-engineering-for-air-gapped
All those systems were air-gapped, meaning the team that builds the software has no access to metrics, logs or runtime.
https://blog.alexewerlof.com/p/reliability-engineering-for-air-gapped
How I Dragged Phantom Tide Out of an OOM Kill Loop
https://github.com/tg12/phantomtide/blob/main/docs/oom-postmortem.md
From the inside, it was a systems failure spread across FastAPI, uvicorn, Redis, ClickHouse, APScheduler, Docker memory limits, and a startup sequence that had quietly become a deterministic self-attack.
https://github.com/tg12/phantomtide/blob/main/docs/oom-postmortem.md
Shell Tricks That Actually Make Life Easier (And Save Your Sanity)
https://blog.hofstede.it/shell-tricks-that-actually-make-life-easier-and-save-your-sanity
There is a distinct, visceral kind of pain in watching an otherwise brilliant engineer hold down the Backspace key for six continuous seconds to fix a typo at the beginning of a line.
We’ve all been there. We learn ls, cd, and grep, and then we sort of… stop. The terminal becomes a place we live in-but we rarely bother to arrange the furniture. We accept that certain tasks take forty keystrokes, completely unaware that the shell authors solved our exact frustration sometime in 1989.
Here are some tricks that aren’t exactly secret, but aren’t always taught either. To keep the peace in our extended Unix family, I’ve split these into two camps: the universal tricks that work on almost any POSIX-ish shell (like sh on FreeBSD or ksh on OpenBSD), and the quality-of-life additions specific to interactive shells like Bash or Zsh.
https://blog.hofstede.it/shell-tricks-that-actually-make-life-easier-and-save-your-sanity