DevOps&SRE Library

10 Elasticsearch Production Issues (and How Postgres Avoids Them)

Elasticsearch may work great in initial testing and development but Production is a different story. This blog is about what happens after you ship: the JVM tuning, the shard math, the 3 AM pages, the sync pipelines that break silently. The stuff your ops team lives with.

After years of teams running Elasticsearch in production, certain patterns keep emerging. The same issues show up in blog posts, Stack Overflow questions, and incident reports. We've compiled ten of the most common ones below, with references to the engineers who've documented them. We’ve also added images to make it easy to quickly skim through it and compare the challenges against Postgres.

TLDR: With great power comes great operational complexity.

https://www.tigerdata.com/blog/10-elasticsearch-production-issues-how-postgres-avoids-them

4.39K views15:02

DevOps&SRE Library

How OpenAI Scales Postgres to Power 800 Million ChatGPT Users

For years, PostgreSQL has been one of the most critical, under-the-hood data systems powering core products like ChatGPT and OpenAI’s API. As our user base grows rapidly, the demands on our databases have increased exponentially, too. Over the past year, our PostgreSQL load has grown by more than 10x, and it continues to rise quickly.

https://openai.com/index/scaling-postgresql

4.17K views07:05

DevOps&SRE Library

Introduction to Buffers in PostgreSQL

The work around RegreSQL led me to focus a lot on buffers. If you are a casual PostgreSQL user, you have probably heard about adjusting shared_buffers and followed the good old advice to set it to 1/4 of available RAM. But after we went a little bit too enthusiastic about them on a recent Postgres FM episode I've been asked what that's all about.

Buffers are one of those topics that easily gets forgotten. And while they are a foundation block of PostgreSQL's performance architecture, most of us treat them as a black box. This article is going to attempt to change that.

https://boringsql.com/posts/introduction-to-buffers

4.56K views15:04

DevOps&SRE Library

Why Your HA Architecture is a Lie (And That's Okay)

https://mydbanotebook.org/posts/why-your-ha-architecture-is-a-lie-and-thats-okay

4.42K views07:01

DevOps&SRE Library

Is the future of MySQL PostgreSQL (Or MariaDB, or TiDB, or ...)?

https://stokerpostgresql.blogspot.com/2026/01/is-future-of-mysql-postgresql-or.html

4.3K views15:02

DevOps&SRE Library

“You Had One Job”: Why Twenty Years of DevOps Has Failed to Do it

I think the entire DevOps movement was a mighty, twenty year battle to achieve one thing: a single feedback loop connecting devs with prod. On those grounds, it failed.

https://www.honeycomb.io/blog/you-had-one-job-why-twenty-years-of-devops-has-failed-to-do-it

4.48K views07:02

DevOps&SRE Library

OpenTelemetry Collector vs agent: How to choose the right telemetry approach

https://www.cncf.io/blog/2026/02/02/opentelemetry-collector-vs-agent-how-to-choose-the-right-telemetry-approach

4.36K views15:06

DevOps&SRE Library

Unconventional PostgreSQL Optimizations

Creative ideas for speeding up queries in PostgreSQL

https://hakibenita.com/postgresql-unconventional-optimizations

4.24K views07:02

DevOps&SRE Library

Scaling Terraform Across many Teams: A Native Framework for Platform Engineering

This write-up presents a pure Terraform framework where 50+ teams deploy infrastructure using simple tfvars while platform teams maintain reusable building blocks. It highlights native lookup patterns, automated PR updates, and significant boilerplate reduction without adding preprocessing layers.

https://dev.to/jverhoeks/-scaling-terraform-across-many-teams-a-native-framework-for-platform-engineering-3n0b

4.24K views15:03

DevOps&SRE Library

Create readable terraform plans for pull request reviews with tfplan2md

This article introduces tfplan2md, a tool that converts Terraform JSON plans into clearer markdown summaries for pull request reviews. It focuses on making plan output easier to understand in Azure DevOps and GitHub workflows.

https://levelup.gitconnected.com/create-readable-terraform-plans-for-pull-request-reviews-with-tfplan2md-ea646e00e59b

4.15K views07:04

DevOps&SRE Library

Why the OpenTelemetry Batch Processor is Going Away (Eventually)

This article explains why OpenTelemetry no longer recommends the batch processor for production durability-sensitive pipelines. It compares in-memory batching with exporter-level persistent queues and shows how the newer approach improves recovery during collector restarts.

https://www.dash0.com/blog/why-the-opentelemetry-batch-processor-is-going-away-eventually

4.02K views15:05

DevOps&SRE Library

A Comprehensive Comparison of Cloud Backup Tools

This blog post is a comparison of personal, accessible, cloud backup options.

https://www.ybrikman.com/blog/2026/02/03/computer-backup-options

3.68K views07:03

DevOps&SRE Library

Tuckr

Tuckr is a dotfile manager inspired by Stow and Git. Tuckr aims to make dotfile management less painful. It follows the same model as Stow, symlinking files onto $HOME. It works on all the major OSes (Linux, Windows, BSDs and MacOS).

Tuckr aims to bring the simplicity of Stow to a dotfile manager with a very small learning curve. To achieve that goal Tuckr tries to only cover what is directly needed to manage dotfiles and nothing else. We won't wrap git, rm, cp or reimplement the functionality that are perfeclty covered by other utilities in the system unless it greatly impacts usability.

https://github.com/RaphGL/Tuckr

3.85K views15:03

DevOps&SRE Library

Dynamic Istio Ingress Gateway Management with Kyverno

https://medium.com/devtopia/dynamic-istio-ingress-gateway-management-with-kyverno-a807b6e3f0f8

3.54K views07:04

DevOps&SRE Library

Ephemeral Infrastructure: Why Short-Lived is a Good Thing

https://lukasniessen.medium.com/ephemeral-infrastructure-why-short-lived-is-a-good-thing-2cf26afd75ef

4.08K views15:06

DevOps&SRE Library

yoke

Yoke is a Helm-inspired infrastructure-as-code (IaC) package deployer designed to provide a more powerful, safe, and programmatic way to define and deploy packages. While Helm relies heavily on static YAML templates, Yoke takes IaC to the next level by allowing you to leverage general-purpose programming languages for defining packages, making it safer and more powerful than its predecessors.

https://github.com/yokecd/yoke

4.31K views07:02

DevOps&SRE Library

synapse

XDR with eBPF-powered firewall and proxy.

https://github.com/gen0sec/synapse

4.14K views15:03

DevOps&SRE Library

korrel8r

Korrel8r is a rule-based correlation engine that automatically discovers and graphs relationships between cluster resources and observability signals across multiple data stores, enabling unified troubleshooting experiences.

https://github.com/korrel8r/korrel8r

4.15K views07:02

DevOps&SRE Library

lynq

Lynq Operator is a Kubernetes operator that automates database-driven infrastructure provisioning. It reads data from external datasources and dynamically creates, updates, and manages Kubernetes resources using declarative templates.

https://github.com/k8s-lynq/lynq

4.11K views15:05

DevOps&SRE Library

k8s-sidecar

This is a docker container intended to run inside a kubernetes cluster to collect config maps with a specified label and store the included files in a local folder.

https://github.com/kiwigrid/k8s-sidecar

4K views07:05

About

Blog

Apps

Platform