DevOps&SRE Library
18.4K subscribers
461 photos
3 videos
2 files
5.01K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
How We Scaled Code Repository Management at DNSimple

Managing a handful of GitHub repositories is straightforward. Managing hundreds of them consistently is a challenge. Over the years at DNSimple, we've evolved from manual configuration to a fully automated Infrastructure as Code (IaC) approach. This is the story of that evolution, the lessons we learned, and how we built a system that now manages all our GitHub resources through pull requests and CI/CD pipelines.

At DNSimple, we've managed our internal infrastructure as code since day one, primarily using Chef for configuration management. Infrastructure as Code wasn't new to us, it was the foundation of how we operated. The challenge was applying these same principles to externally managed resources like GitHub repositories, which required a different approach than our traditional internal infrastructure management.


https://blog.dnsimple.com/2025/11/managing-repositories-terraform-github
The stacking workflow

Stacked PRs. Stacked diffs. Stacked changes.
A better workflow to manage pull requests.


https://www.stacking.dev
Monitoring & Observability: Using Logs, Metrics, Traces, and Alerts to Understand System Failures

When your application ships to production, it becomes partly opaque. You own the code, but the runtime, network, and platform behaviors often fall outside your direct line of sight. That’s where Monitoring and Observability come in.

Monitoring warns you when predefined thresholds break. Observability lets you explore unknowns, asking new questions in real time and getting meaningful answers without redeploying.

For engineers running software in production, observability rests on three pillars: logs, metrics, and traces. Each offers a different lens into system behavior. Understanding where each excels and where it doesn’t is essential for building a practical, scalable visibility strategy.


https://blog.railway.com/p/using-logs-metrics-traces-and-alerts-to-understand-system-failures
KISS vs DRY in Infrastructure as Code: Why Simple Often Beats Clever

Every Infrastructure as Code tutorial starts the same way: provision a single S3 bucket, create one EC2 instance, deploy a basic load balancer. The examples are clean, simple, and elegant. You follow along, everything works, and you feel like you understand Terraform.

Then you get to your actual production environment, and everything changes.

You’re not starting from scratch with a blank AWS account. You’ve got existing resources that were manually created two years ago by someone who left the company. There’s brownfield infrastructure everywhere with no clear documentation. You need to import existing state, figure out what’s actually running, and somehow wrangle it all into code without breaking production. On top of that, you need to manage 200 instances across dev, staging, and production environments. Multiple AWS accounts with different configurations and permissions. Three regions for disaster recovery. Azure for the legacy workloads that nobody wants to touch. GCP running your GKE clusters for the containerized applications.

Suddenly that elegant tutorial code becomes a nightmare of orchestration, state management, environment-specific configurations, and brownfield complexity. You’re not just writing infrastructure code anymore. You’re trying to organize, orchestrate, and maintain it at scale while dealing with the reality that infrastructure is messy, evolving, and full of historical baggage.

This is the scale gap, and it’s where the KISS vs DRY debate stops being theoretical and starts costing real time, money, and engineering effort.


https://rosesecurity.dev/2025/11/14/kiss-versus-dry-iac.html
pg_textsearch

PostgreSQL extension for BM25 relevance-ranked full-text search. Postgres OSS licensed.


https://github.com/timescale/pg_textsearch
pgedge-postgres-mcp

The pgEdge Postgres Model Context Protocol (MCP) server enables SQL queries against PostgreSQL databases through MCP-compatible clients like Claude Desktop. The Natural Language Agent provides supporting functionality that allows you to use natural language to form SQL queries.


https://github.com/pgEdge/pgedge-postgres-mcp
arcane

Modern Docker Management, Designed for Everyone


https://github.com/getarcaneapp/arcane