DevOps&SRE Library
19K subscribers
426 photos
2 videos
2 files
5.16K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
10 Elasticsearch Production Issues (and How Postgres Avoids Them)

Elasticsearch may work great in initial testing and development but Production is a different story. This blog is about what happens after you ship: the JVM tuning, the shard math, the 3 AM pages, the sync pipelines that break silently. The stuff your ops team lives with.

After years of teams running Elasticsearch in production, certain patterns keep emerging. The same issues show up in blog posts, Stack Overflow questions, and incident reports. We've compiled ten of the most common ones below, with references to the engineers who've documented them. We’ve also added images to make it easy to quickly skim through it and compare the challenges against Postgres.

TLDR: With great power comes great operational complexity.


https://www.tigerdata.com/blog/10-elasticsearch-production-issues-how-postgres-avoids-them
How OpenAI Scales Postgres to Power 800 Million ChatGPT Users

For years, PostgreSQL has been one of the most critical, under-the-hood data systems powering core products like ChatGPT and OpenAI’s API. As our user base grows rapidly, the demands on our databases have increased exponentially, too. Over the past year, our PostgreSQL load has grown by more than 10x, and it continues to rise quickly.


https://openai.com/index/scaling-postgresql
Introduction to Buffers in PostgreSQL

The work around RegreSQL led me to focus a lot on buffers. If you are a casual PostgreSQL user, you have probably heard about adjusting shared_buffers and followed the good old advice to set it to 1/4 of available RAM. But after we went a little bit too enthusiastic about them on a recent Postgres FM episode I've been asked what that's all about.

Buffers are one of those topics that easily gets forgotten. And while they are a foundation block of PostgreSQL's performance architecture, most of us treat them as a black box. This article is going to attempt to change that.


https://boringsql.com/posts/introduction-to-buffers
Is the future of MySQL PostgreSQL (Or MariaDB, or TiDB, or ...)?

https://stokerpostgresql.blogspot.com/2026/01/is-future-of-mysql-postgresql-or.html
“You Had One Job”: Why Twenty Years of DevOps Has Failed to Do it

I think the entire DevOps movement was a mighty, twenty year battle to achieve one thing: a single feedback loop connecting devs with prod. On those grounds, it failed.


https://www.honeycomb.io/blog/you-had-one-job-why-twenty-years-of-devops-has-failed-to-do-it
OpenTelemetry Collector vs agent: How to choose the right telemetry approach

https://www.cncf.io/blog/2026/02/02/opentelemetry-collector-vs-agent-how-to-choose-the-right-telemetry-approach
Unconventional PostgreSQL Optimizations

Creative ideas for speeding up queries in PostgreSQL


https://hakibenita.com/postgresql-unconventional-optimizations
Scaling Terraform Across many Teams: A Native Framework for Platform Engineering

This write-up presents a pure Terraform framework where 50+ teams deploy infrastructure using simple tfvars while platform teams maintain reusable building blocks. It highlights native lookup patterns, automated PR updates, and significant boilerplate reduction without adding preprocessing layers.


https://dev.to/jverhoeks/-scaling-terraform-across-many-teams-a-native-framework-for-platform-engineering-3n0b
Create readable terraform plans for pull request reviews with tfplan2md

This article introduces tfplan2md, a tool that converts Terraform JSON plans into clearer markdown summaries for pull request reviews. It focuses on making plan output easier to understand in Azure DevOps and GitHub workflows.


https://levelup.gitconnected.com/create-readable-terraform-plans-for-pull-request-reviews-with-tfplan2md-ea646e00e59b
Why the OpenTelemetry Batch Processor is Going Away (Eventually)

This article explains why OpenTelemetry no longer recommends the batch processor for production durability-sensitive pipelines. It compares in-memory batching with exporter-level persistent queues and shows how the newer approach improves recovery during collector restarts.


https://www.dash0.com/blog/why-the-opentelemetry-batch-processor-is-going-away-eventually
A Comprehensive Comparison of Cloud Backup Tools

This blog post is a comparison of personal, accessible, cloud backup options.


https://www.ybrikman.com/blog/2026/02/03/computer-backup-options
Tuckr

Tuckr is a dotfile manager inspired by Stow and Git. Tuckr aims to make dotfile management less painful. It follows the same model as Stow, symlinking files onto $HOME. It works on all the major OSes (Linux, Windows, BSDs and MacOS).

Tuckr aims to bring the simplicity of Stow to a dotfile manager with a very small learning curve. To achieve that goal Tuckr tries to only cover what is directly needed to manage dotfiles and nothing else. We won't wrap git, rm, cp or reimplement the functionality that are perfeclty covered by other utilities in the system unless it greatly impacts usability.


https://github.com/RaphGL/Tuckr
yoke

Yoke is a Helm-inspired infrastructure-as-code (IaC) package deployer designed to provide a more powerful, safe, and programmatic way to define and deploy packages. While Helm relies heavily on static YAML templates, Yoke takes IaC to the next level by allowing you to leverage general-purpose programming languages for defining packages, making it safer and more powerful than its predecessors.


https://github.com/yokecd/yoke
synapse

XDR with eBPF-powered firewall and proxy.


https://github.com/gen0sec/synapse
korrel8r

Korrel8r is a rule-based correlation engine that automatically discovers and graphs relationships between cluster resources and observability signals across multiple data stores, enabling unified troubleshooting experiences.


https://github.com/korrel8r/korrel8r
lynq

Lynq Operator is a Kubernetes operator that automates database-driven infrastructure provisioning. It reads data from external datasources and dynamically creates, updates, and manages Kubernetes resources using declarative templates.


https://github.com/k8s-lynq/lynq
k8s-sidecar

This is a docker container intended to run inside a kubernetes cluster to collect config maps with a specified label and store the included files in a local folder.


https://github.com/kiwigrid/k8s-sidecar
2