Kubernative by Palark | Kubernetes news and goodies
1.59K subscribers
95 photos
330 links
News, articles, tools, and other useful cloud native stuff for DevOps, SRE and software engineers. This channel is managed by Palark GmbH. Contact @dshnow to suggest your content.
Download Telegram
Here goes our latest bunch of interesting Kubernetes-related articles recently spotted online:

1. "Kubernetes Homelab Series (Part 1): How I Built My Kubernetes Cluster from Scratch" by Pablo del Arco.

In this series, I’ll share my journey of building a Kubernetes homelab from scratch — the tools, the wins, the obstacles, and the lessons — all based on personal, real-world experiences rather than typical tutorials. [..] To kick things off, I started by setting up a K3s cluster — a lightweight Kubernetes distribution perfect for homelabs.


2. "Fuzzing the CNCF landscape in 2024" by Chris Aniszczyk (CNCF), Adam Korczynski (Ada Logics), David Korczynski (Ada Logics).

CNCF maintains a high level of security for its projects by way of a series of initiatives such as security auditing, supply-chain assessments and security automation work. In this blogpost we will go over CNCF’s fuzzing initiative and its impact in 2024. Fuzzing a technique for finding security and reliability bugs by way of executing vast amounts of arbitrary inputs against a given API or codebase.


3. "Exploring the Kubernetes API Server Proxy" by Rory McCune.

[..] I thought it’d be interesting to look at a lesser known feature of the Kubernetes API server which has some interesting security implications. The Kubernetes API server can act as an HTTP proxy server, allowing users with the right access to get to applications they might otherwise not be able to reach. This is one of a number of proxies in the Kubernetes world which serve different purposes. The proxy can be used to access pods, services, and nodes in the cluster, we’ll focus on pods and nodes for this post.


4. "Would the Kubernetes CPU limit be an anti-pattern?" by Carlos Alberto Alves Correia.

Most of the time, when you ask a DevOps engineer if it is good practice to set the limit for deployments, 99% of them will say YES. I see that there is a consensus among professionals that it is good to block resources to prevent a hungry application from consuming all the resources of the cluster. Part of this is true, but not for the CPU and I will explain why.


5. "Cluster API to production: from Cluster API to GitOps with Argo CD and Kyverno" by Lior Friedman.

For Argo CD to deploy resources in tenant clusters we first need to configure the clusters in Argo CD. This guide goes over automatically generating Argo CD cluster credentials secrets using Kyverno. By the end of this guide, we will be able to deploy addons to Cluster API tenant clusters with Argo CD from the management cluster.


6. "How to Create a Production-Ready EKS Cluster on AWS Using Terraform (Part 2: EKS Setup)" by Alex Tsvetkov.

In Part 2, we’ll cover configuring the EKS cluster with Terraform, setting up managed node groups, and integrating IAM roles and policies for secure and efficient cluster operations.


#articles
👍3
Here goes our latest bunch of interesting Kubernetes-related articles recently spotted online:

1. "Kafka vs NATS: A Comparison for Message Processing" by Josson Paul Kalapparambath.

Kafka and NATS are two popular tools for handling streaming and messaging. They have different architectures and different performance characteristics. They are suitable for specific use cases. In this article, we will compare the features of NATS with Kafka and explain the use cases I addressed at work.


2. "Kubectl-r[ex]ec: A kubectl plugin for auditing kubectl exec commands" by Marton Natko, Adyen.

With this minimalistic application, we can easily audit exec commands, and we only have to install a few manifests on the Kubernetes side while distributing our plugin to our engineers.


3. "Kubernetes Best Practices I Wish I Had Known Before" by Engin Diri, Pulumi.

In this post, I will highlight some crucial Kubernetes best practices. They are from my years of experience with Kubernetes in production. Think of this as the curated “Kubernetes cheat sheet” you wish you had from Day 1. Buckle up; it’s going to be an exciting ride.


4. "Configuration Management at Ant Group: Generated Manifest & Immutable Desired State" by KusionStack.

In this first article, we will examine the specific challenges we encountered over the years, the strategies we devised to address them, and the resulting patterns that have emerged as what we believe to be best practices — Generated Manifest & Immutable Desired State. Through this exploration, we aim to provide valuable insights and practical guidance for navigating the complexities of configuration management in a dynamic and highly regulated environment.


5. "So you wanna write Kubernetes controllers?" by Ahmet Alp Balkan.

Low barrier to entry combined with good intentions and the “illusion of working implementation” is not a recipe for success while developing production-grade controllers. I’ve seen the real-world consequences of controllers developed without adequate understanding of Kubernetes and the controller machinery at multiple large companies. We went back to the drawing board and rewritten nascent controller implementations a few times to observe which mistakes people new to controller development make.


6. "Kubernetes RBAC: A Comprehensive Guide" by Oshrat Nir.

Kubernetes RBAC is a method used to manage user access rights to resources within a Kubernetes cluster. It enables administrators to grant users or applications only the permissions they need to perform their tasks, and no more. RBAC uses authentication and authorization to achieve its purpose by verifying the identity of a user or system trying to access the Kubernetes API server.


#articles
🔥4👍3
Here goes our latest bunch of interesting Kubernetes-related articles recently spotted online:

1. "How I Supercharged My Local Kubernetes Setup" by Joseph Whiteaker.

Setting up a local Kubernetes cluster with Kind, MetalLB, and Istio takes some effort, but the payoff is a highly flexible, production-like environment that runs entirely on your local machine. Through this process, I’ve explored custom networking, pull-through registries, certificate management, and domain-based routing — all things that bring local Kubernetes closer to how real-world clusters operate.


2. "Vulnerability management in the microservice era: From zero to hero" by Nigel Douglas, Sysdig.

Kubernetes vulnerability scanning is the process of systematically inspecting a Kubernetes cluster, including its container images and configurations, to detect security misconfigurations or vulnerabilities that could compromise the platform’s security posture. It’s an essential practice for organizations to maintain a strong security posture and it offers several critical benefits.


3. "OpenTelemetry Collector deployment modes in Kubernetes" by Reese Lee & Brad Schmitt, New Relic.

A good way to simplify this process [deploying the OpenTelemetry Collector] is to familiarize yourself with "Collector deployment modes"—the various methods for setting up and managing the Collector to gather, process, and export application and system data within Kubernetes. It’s important to note that “deployment modes” differ from “deployment patterns,” a distinction that can be confusing. This blog post guides you through these key concepts so you’ll have the foundational knowledge you need to choose the right deployment mode for your observability strategy.


4. "Cluster API + Talos + Proxmox = ❤️" by Quentin Joly.

Today, I am tackling another aspect of Talos: provisioning Kubernetes clusters on Proxmox VMs via the Cluster API. I do not have the expertise to write a comprehensive article on the Cluster API, nor have I tested multiple providers or clouds. In this article, I will instead present my journey to deploy a Talos cluster on Proxmox via the Cluster API, detailing the steps, encountered issues, and solutions found.


5. "OpenTelemetry: A Guide to Observability with Go" by Luca Cavallin, a CNCF Ambassador.

In this post, I'll walk through how to integrate OpenTelemetry in a Go application. By the end, you'll have a reusable telemetry package that sets up logging, metrics, and tracing - all without cluttering your application code! I've published the package, complete with tests and examples, on GitHub: gotel. Feel free to use it as a starting point for your own projects.


6. "How to Build a Multi-Tenancy Internal Developer Platform with GitOps and vCluster" by Artem Lajko.

Here’s what you can expect from this blog:
- Introduction to Kubernetes and Internal Developer Platforms
- The Role of Platform Engineering in Building and Managing an IDP
- Implementing GitOps with Argo CD to Manage Your IDP Seamlessly
- Cost-Efficient Strategies for Multi-Tenant IDPs
- Hands-On Guide and GitHub Resources


#articles
👍3
Here goes our latest bunch of interesting Kubernetes-related articles recently spotted online:

1. "Standardizing App Delivery with Flux and Generic Helm Charts" by Stefan Prodan, ControlPlane.

In this guide we will explore how Flux can be used to standardize the lifecycle management of applications by leveraging the Generic Helm Chart pattern. The big promise of this pattern is that it should reduce the cognitive load on developers, as they only need to focus on the service-specific configuration, while the Generic Helm Chart shields them from the complexity of the Kubernetes API.


2. "The 100 Million Pod Mesh" by John Howard, Solo.io.

In this test, we deploy 100 million pods across 2,000 clusters, proving it can handle extreme scale with minimal resources, near-instant updates, and no manual tuning, resulting in effortless scalability and cost efficiency for enterprises.


3. "The AI Model Showdown – LLaMA 3.3-70B vs. Claude 3.5 Sonnet v2 vs. DeepSeek-R1/V3" by Itiel Shwartz, Komodor.

We tested DeepSeek’s models head-to-head against industry leaders in solving real-world Kubernetes challenges. The results were nothing short of fascinating and quite revealing, particularly regarding DeepSeek’s current capabilities in production environments.


4. "Simplifying Ingress Resource on AWS EKS: A Guide to AWS Load Balancer Controller" by Kenny Ang.

In this article, we will explore the AWS LBC and understand how it works (and doesn’t). To achieve this, I will walk you through installing the AWS LBC on an EKS cluster and observe the behavior after creating an ingress resource.


5. "Managing Secrets at Scale: Why We Chose SOPS for Terraform and Kubernetes Secrets" by Teodor J. Podobnik.

From SSH keys and Kubernetes Secrets to GitHub tokens and API credentials, keeping these secrets secure was vital to our product’s security and compliance. So we looked into several solutions like HashiCorp Vault, SealedSecrets and GCP Secret Manager but none fully met our needs.


6. "EKS vs. GKE Networking" by Jason Umiker.

I find that some of the biggest differences between EKS and GKE (as well as the underlying AWS and GCP) are in their differing approaches to networking. So, this is at the heart of any true comparison of the two services.


#articles
👍3
Here goes our latest bunch of interesting Kubernetes-related articles recently spotted online:

1. "My kubernetes pods keep crashing with CrashLoopBackOff but I can’t find any log" by Harold Finch.

When a Kubernetes pod goes into a CrashLoopBackOff state and you can't find any logs, it can be frustrating. Here’s a step-by-step troubleshooting guide to help identify and fix the issue.


2. "What we learned after running Airflow on Kubernetes for 2 years" by Alexandre Magno Lima Martins.

To put it in perspective, we have over 300 DAGs in production, running more than 5.000 tasks per day, on average. So I would say that we have a medium-size Airflow deployment, capable of delivering value for our users. For more than 8 months now we have been running without a single incident or failure in Airflow. With this post, I want to share important aspects of our deployment that helped us to achieve a scalable, and reliable environment.


3. "Falco" by Luc Juggery.

The following gives an overview of Falco, a security tool that provides runtime security across hosts, containers, Kubernetes, and cloud environments. [It covers:] Installing Falco, Enabling falcosidekick, Enabling falcosidekick web UI, and Custom events.


4. "Demo an Automated Canary Deployment on Kubernetes with Argo Rollouts, Istio, and Prometheus" by Whitney Lee, a CNCF Ambassador.

Building stuff is fun! Let’s use Argo Rollouts, Istio, and Prometheus to automate a canary deployment on Kubernetes! The application we’ll run is the Argo Rollouts Demo Application which does a great job of visualizing how traffic is slowly routed from from the older, stable version of the application to the newer “canary” version.


5. "Getting Started with K3s: A Practical Guide to Setup and Scaling" by Joseph Whiteaker.

This post serves as both an introductory guide for those new to K3s and a quick reference for those already familiar with it. We’ll cover installation, adding server and worker nodes, configuring load balancing, etc…


6. "Kubernetes Control Plane Load Balancing (CPLB) Explained" by Juan Luis de Sousa-Valadas, Mirantis.

CPLB, with its evolution to a userspace reverse proxy load balancer, offers a simplified and more compatible approach compared to the previous IPVS-based system. When combined with k0s it is possible to build lightweight, but highly available Kubernetes clusters.


#articles
👍3
Here goes our latest selection of interesting Kubernetes-related articles recently spotted online:

1. "Securing the Kubernetes Host Operating System" by Rafael Natali.

If the host operating system is breached, the attacker could use it to target other nodes in the cluster, along with all the Pods and applications running on that node. Eventually, the attacker can even access other systems in your network! The next subsections contain the information necessary to secure the host operating system.


2. "Every pod eviction in Kubernetes, explained" by Ahmet Alp Balkan.

There are so many ways Kubernetes terminates workloads, each with a non-trivial (and not always predictable) machinery, and there’s no page that lists out all eviction modes in one place. This article will dig into Kubernetes internals to walk you through all the eviction paths that can terminate your Pods, and why “kubelet restarts don’t impact running workloads” isn’t always true, and finally I’ll leave you with a cheatsheet at the end.


3. "WebAssembly on Kubernetes" by Nicolas Fränkel.

In this post, I showed how to use Webassembly on Kubernetes with the Wasmedge runtime. I created three flavors for comparison purposes: native, embed, and runtime. The first two are "regular" Docker images, while the latter contains only a single Wasm file, which makes it very lightweight and secure.


4. "Yoke is really cool" by Xe Iaso.

With Yoke, you write your infrastructure definitions in Go or Rust, compile it to WebAssembly, and then you take input and output Kubernetes manifests that get applied to the cluster. [..] One of the big advantages of using WebAssembly here is that you can use the same Kubernetes manifest types that Kubernetes itself uses. This means you don't have to write your own types and you can reuse code aggressively.


5. "Exploring Cloud Native projects in CNCF Sandbox. Part 3: 14 arrivals of 2024 H1" by Dmitry Shurupov, Palark.

We’re continuing this series with our brief introductions to the projects added to the Sandbox in April, June, and July of 2024: Radius, Stacker, Score, Bank-Vaults, TrestleGRC, bpfman, Koordinator, KubeSlice, Atlantis, Kubean, Connect, Kairos, Kuadrant, and openGemini.


6. "How to Setup Preview Environments with FluxCD in Kubernetes" by Meysam Azad.

Preview environment is where you see a live state of your changes from your pull request before being merged into the default branch. It gives you a look'n feel of what it would be like if you merged your changes. [..] in this blog post, I will show you how to achieve this using FluxCD Operator.


7. "Container Network Interface (CNI) in Kubernetes: An Introduction" by Homayoon (Hue) Alimohammadi.

In this article, we’re gonna learn about the Container Network Interface (CNI) and CNI plugins, what they’re supposed to do, and how they’re implemented. We’ll also see a simple CNI implementation in Go and Bash, and test it in a Canonical Kubernetes cluster.


#articles
👍2
A couple of recent articles on optimising memory consumption in Prometheus:

1. Prometheus: How We Slashed Memory Usage (Devoriales).

“In the production Kubernetes cluster I worked on, Prometheus memory usage climbed past 55 GB, peaking at 60 GB, despite an already oversized node. Indeed, the environment was rapidly growing in the number of applications, but the situation was still not sustainable.”


2. Understanding and optimizing resource consumption in Prometheus (Palark blog).

“While Prometheus is an excellent and capable monitoring system, one aspect I find very frustrating is its resource consumption. If this frustrates you as much as it does me, let’s break down the causes of this issue and see how to address it.”


#articles #observability
👍5👎1🔥1
We haven’t shared any Kubernetes-related articles for a while. Filling this gap with some of the latest interesting reads:

1. Kubernetes is not just for Black Friday by Thibault Martin.
I’ve always ruled out Kubernetes as too complex machinery designed for large organizations who face significant surges in traffic during specific events like Black Friday sales. I thought Kubernetes had too many moving parts and would work against my objectives. I was wrong. Kubernetes is not just for large organizations with scalability needs I will never have. Kubernetes makes perfect sense for a homelabber who cares about having a simple, sturdy setup.


2. Exploring Cloud Native projects in CNCF Sandbox. Part 4: 13 arrivals of 2024 H2 by Dmitry Shurupov, Palark.
Familiarise yourself with the following recently added CNCF projects: Ratify, Cartography, HAMi, KAITO, Kmesh, Sermant, LoxiLB, OVN-Kubernetes, Perses, Shipwright, KusionStack, youki, OpenEBS!


3. Kubernetes List API performance and reliability by Ahmet Alp Balkan.
We use Kubernetes beyond officially supported/tested scale limits by running more than 5,000 nodes and over a hundred thousand of pods in a single cluster. In these large scale setups, expensive “list” calls on the Kubernetes API are the achilles heel of the control plane reliability and scalability. In this article, I’ll explain which list call patterns pose the most risk, and how recent and upcoming Kubernetes versions are improving the list API performance.


4. Kubernetes Networking from Packets to Pods by Luca Cavallin.
The TCP/IP model, which powers the modern internet, is composed of four primary layers: [..] Understanding this layered approach is fundamental, as every network packet in a Kubernetes cluster adheres to this model. We'll explore this entire ecosystem in three parts: the foundational technologies that make it all possible, the core Kubernetes model itself, and finally, advanced topics and practical guides.


5. What Would a Kubernetes 2.0 Look Like by Matthew Duggan.
Some common trends have emerged, where mistakes or misconfiguration arise from where Kubernetes isn't opinionated enough. Even ten years on, we're still seeing a lot of churn inside of ecosystem and people stepping on well-documented landmines. So, knowing what we know now, what could we do differently to make this great tool even more applicable to more people and problems?


6. Rootless container builds on Kubernetes by Spyros Trigazis, CERN.
In this post, we will present 3 options (podman/buildah, buildkit and kaniko) for building container images in Kubernetes pods as non-root with containerd 2.x as runtime. Further improvements can be made using kata-containers, firecracker, gvisor or others but the complexity increases and administrators have to maintain multiple container runtimes.


#articles
👍8
The next Kubernetes release, 1.34, is scheduled for 27th August. The earliest article covering the upcoming changes was just published on the project’s blog. Its feature highlights include:

- An alpha version of KYAML, a new YAML subset that was designed for Kubernetes and aims to be a safer and less ambiguous;
- Improved tracing for kubelet and API Server;
- Structured parameters for Dynamic Resource Allocation (DRA) becoming stable;
- ServiceAccount tokens for image pull authentication moving to beta;
- PreferSameZone and PreferSameNode traffic distribution for Services moving to beta.

UPD: Even better (more detailed) overview of new K8s v1.34 features can be found in this excellent article by Nigel Douglas from Cloudsmith.

#news #releases #articles
👍5
Here goes our latest bunch of interesting Kubernetes-related articles recently spotted online:

1. "How I Survived the Great Kubernetes Exodus: Migrating EKS Cluster from v1.26 to v1.33 on AWS" by Ukeme David Eseme.
So when it was time to migrate a clients 3-4 years old Amazon EKS cluster from v1.26 to v1.33, I knew it wouldn’t just be a version bump—it would be a battlefield. This cluster wasn't just any cluster—it was a complex ecosystem running critical healthcare applications with: 46 Custom Resource Definitions (CRDs) across multiple systems, 7 production domains with SSL certificates, Critical data in PostgreSQL databases, Zero downtime tolerance for production services, Complex networking with Istio service mesh, Monitoring stack with Prometheus and Grafana…


2. "Debugging the One-in-a-Million Failure: Migrating Pinterest’s Search Infrastructure to Kubernetes" by Samson Hu, Shashank Tavildar, Eric Kalkanger, and Hunter Gatewood (Pinterest).
While migrating Pinterest’s search infrastructure — which powers core experiences for millions of users monthly — to Kubernetes, we faced a challenge in the new environment: one in every million search requests took 100x longer than usual. This post chronicles our investigation, uncovering an elusive interaction between our memory-intensive search system and a seemingly innocent monitoring process. The journey involves profiling search systems, debugging performance issues, Linux kernel features, and memory management.


3. "How we tracked down a Go 1.24 memory regression across hundreds of pods" by Nayef Ghattas, Datadog.
Our story begins while the new version was being rolled out internally. Shortly after deploying it to one of our data-processing services, we noticed an unexpected memory usage increase. We observed the same pattern, a ~20% increase in memory usage, across multiple environments before pausing the rollout. To confirm our suspicions, we conducted a bisect in the staging environment, which pointed directly to the Go 1.24 upgrade as the culprit.


4. "Production-Grade Pain: Lessons From Scaling Kubernetes on EKS" by Aditya Chowdhry, Probo.
Using AWS’s managed Kubernetes offering (EKS) initially simplified our infrastructure management, but as our application grew in scale and complexity, we faced several unexpected challenges in Scaling (Cluster Autoscaler Wasn’t Enough), Networking (Ingress Wars: AWS ALB vs. NGINX), and Application Behavior (Pod Sizing Matters; Graceful Termination; HPA Tuning).


5. "Kubernetes Monitoring — A Complete Solution, Part 8: Logging with VictoriaLogs" by Ryan Jacobs.
Part 8 in a series of posts where we’ll stand up an entire monitoring stack on my home Talos Linux cluster. [..] VictoriaLogs, which is made by the same team as VictoriaMetrics, only stores its data in a local directory, which can be backed by whatever your CSI provides in Kubernetes, and even plays well with NFS just like VM does.


6. "K8sGPT for Kubernetes troubleshooting: How AI helps in different cases" by Evgeny Torin, Palark.
In this article, I will explain what K8sGPT is, how to install it and connect to AI, and which features it offers. I will also share some examples of the output you can expect from this tool and what diagnostics it can perform. Throughout the preparation of this overview, I tested different AI integrations available as well as a number of models (including a local one). All of my examples will be backed up by commands and detailed logs.


#articles
5👍5
Here goes our latest bunch of interesting Kubernetes-related articles recently spotted online:

1. "Kubernetes 1.34: Deep dive into new alpha features" by Kirill Kononovich, Palark.
Kubernetes 1.34’s anticipated release is coming on August 27th. With that around the corner, we’ve prepared a comprehensive run-through of the fascinating 13 alpha features in this release, examining each of them in detail. From asynchronous API calls and granular container restart rules to native Pod certificates and the new KYAML format, let’s dive into the exciting updates the upcoming K8s version has in store!


2. "My process to debug DNS timeouts in a large EKS cluster" by Jack Lindamood, Anthropic.
We run a very large AWS EKS cluster with lots of interesting challenges. This post is about a recent investigation into DNS resolution failures that we were able to root cause to an Elastic Network Interface (ENI) packets per second (PPS) limit and a further root cause of the combination of sudo defaults and ndots in our cluster DNS.


3. "Seamless Istio Upgrades at Scale" by Rushy R. Panchal, Airbnb.
Airbnb has been running Istio at scale since 2019. We support workloads running on both Kubernetes and virtual machines (using Istio’s mesh expansion). Across these two environments, we run tens of thousands of pods, dozens of Kubernetes clusters, and thousands of VMs. [..] Istio is a foundational piece of our architecture, which makes ongoing maintenance and upgrades a challenge. Despite that, we have upgraded Istio a total of 14 times. This blog post will explore how the Service Mesh team at Airbnb safely upgrades Istio while maintaining high availability.


4. "The Simplest GitOps Implementation That Actually Works" by Gabriel Garrido.
In this article we will strip GitOps down to its bare essentials and build the simplest implementation that actually works. No fancy operators, minimal tooling - just Git, GitHub Actions, and a sprinkle of automation magic. [..] For the deployment part, I’m using ArgoCD to watch the manifests repository and sync changes to the cluster, but you could just as easily apply the manifests manually or use a simple CronJob. The beauty is in the simplicity of the pipeline itself.


5. "From Linux Primitives to Kubernetes Security Contexts" by Dave Altena, LearnKube.
The Kubernetes API offers several ways to restrict container privileges using the Security Context. [..] Many teams discover these controls only after a security audit or scanner flags a running container. The next steps are usually reactively patching the config, suppressing the warning and moving on. Before we get into Kubernetes SecurityContexts, we need to understand what they're actually configuring under the hood.


#articles
👍41
Sharing another bunch of interesting Kubernetes-related articles recently spotted online:

1. "Tuning Linux Swap for Kubernetes: A Deep Dive" by Ajay Sundar Karuppasamy.
In this blogpost, I'll dive into critical Linux kernel parameters that govern swap behavior. I will explore how these parameters influence Kubernetes workload performance, swap utilization, and crucial eviction mechanisms. I will present various test results showcasing the impact of different configurations, and share my findings on achieving optimal settings for stable and high-performing Kubernetes clusters.


2. "Top 30 Argo CD Anti-Patterns to Avoid When Adopting Gitops" by Kostis Kapelonis, Codefresh.
Here is the full list of the antipatterns we will see: Not understanding the declarative setup of Argo CD; Creating Argo CD applications in a dynamic way; Using Argo CD parameter overrides; Adopting Argo CD without understanding Helm; Adopting Argo CD without understanding Kustomize; Assuming that developers need to know about Argo CD; Grouping applications at the wrong abstraction level; Abusing the multi-source feature of Argo CD; Not splitting the different Git repositories; Disabling auto-sync and self-heal…


3. "Manage Secrets of your Kubernetes Platform at Scale with GitOps" by Artem Lajko.
If you are building a platform on Kubernetes it does not matter what fancy name you give it. You will run into this challenge sooner or later. This blog is not trainer material. It is not about perfect labs. It is about real world experience with real pain points. The idea is simple. Instead of managing every cluster manually you connect them to a control plane. But the tricky part is how to do this in a secure and repeatable way especially when secrets are involved.


4. "Migrating from Kubernetes Ingress to Gateway API: A Step-by-Step Guide" by Kelvin Manavar.
If your organization is currently relying on Ingress and considering a migration to the Gateway API, this guide will walk you through the process. We’ll explore why the Gateway API is worth adopting, what changes you need to be aware of, and the practical steps to migrate from your existing Ingress setup to the modern Gateway API within a running Kubernetes cluster.


5. "Longhorn – a Kubernetes-native filesystem" by Vegard.
Longhorn in a way has many similarities with ZFS, but made for a distributed environment like Kubernetes. In a nutshell, Longhorn provision block devices out of a pool – or several, I have an SSD pool and a HDD pool. You’ll create storage classes using those pools, with the properties you like. A storageclass is sort of a template for a volume, that says what properties it should have when it’s created. You can still change it afterwards, though. Longhorn also comes with a decent web console, making it easy to get overview of – and manage – your Longhorn storage solution. It has built-in support for snapshot-based backups, most commonly to S3 (or compatible) buckets.


6. "Importance of Graceful Shutdown in Kubernetes" by Alik Khilazhev, Criteo.
In this post, I will share what I have learned about implementing proper graceful shutdown in Kubernetes. I will show you exactly what happens behind the scenes, provide working code examples, and back everything with real test results that clearly demonstrate the difference.


#articles
👍3
AKS Labs is a free online collection of hands-on workshops for learning Azure Kubernetes Service (AKS) to deploy, scale, and manage containerised applications.

Currently, it offers 20+ labs in the following categories: Getting Started, Networking, Security, Operations, Platform Engineering, Storage, and AI Workloads. All of them come with ready-to-use instructions and listings.

#articles #career #Azure
👍5
Sharing another bunch of interesting Kubernetes-related articles recently spotted online:

1. "Beyond the surface - Exploring attacker persistence strategies in Kubernetes" by Rory McCune.
The goal of this talk is to lay out one attack path that attackers might use to retain and expand their access after an initial compromise of a Kubernetes cluster by getting access to an admin’s credentials. It doesn’t cover all the ways that attackers could do this, but provides one path and also hopefully illuminates some of the inner workings and default settings that attackers might exploit as part of their exploits.


2. "How our small company migrated from Docker Swarm to Kubernetes" by Miroslav Hrivnak, CORETEQ Technology.
As a small tech company with 20–30 people, we’ve gone through the natural evolution of infrastructure. From the days when one server and a few LXC containers were enough, to Docker and Docker Swarm, and finally to Kubernetes, which we now use not only in production but also for development and testing. In this article, I’d like to share why we migrated, the challenges we faced, and how we successfully moved from Docker Swarm to Kubernetes.


3. "k8s-1m Overview" by Ben Chess.
This is an effort to create a fully functional Kubernetes cluster with 1 million active nodes.


4. "Zero Trust for Kubernetes: Implementing Service Mesh Security" by Heinan Cabouly.
Let’s walk through a practical implementation of Zero Trust security using Istio on Amazon EKS. I’ll show you real-world configurations based on production Kubernetes environments.


5. "Clear Kubernetes namespace contents before deleting the namespace, or else" by Hongli Lai.
Our Kubernetes platform test suite creates namespaces with their corresponding contents, then deletes everything during cleanup. We noticed a strange problem: namespace deletion would sometimes get stuck indefinitely. The root cause was surprising — we had to clear the contents before deleting the namespace! We also learned that getting stuck isn’t the only issue that can occur if we don’t do this.


6. "Scaling Kubernetes at Mercado Libre with Karpenter and GitOps" by Juliano Marcos Martins, Mercado Libre.
This article explores how we’ve used Karpenter and GitOps to evolve our ecosystem (35,000 active microservices; approximately 30,000 daily deployments; around 120,000 pull requests per day), achieving automated provisioning, declarative governance, and large-scale cloud-native operations.


#articles
👍2🔥2
Here come some of the interesting Kubernetes-related articles recently spotted online:

1. "How Airbnb Runs Distributed Databases on Kubernetes at Scale" by ByteByteGo.
Instead of limiting a database cluster to one Kubernetes environment, they chose to deploy distributed database clusters across multiple Kubernetes clusters, each one mapped to a different AWS Availability Zone. This is not a common design pattern. Most companies avoid it because of the added complexity. But Airbnb’s engineers saw it as the best way to ensure reliability, reduce the impact of failures, and keep operations smooth.


2. "Kubernetes Configuration Good Practices" by Kirti Goyal, Kubernetes blog.
This blog brings together tried-and-tested configuration best practices. The small habits that make your Kubernetes setup clean, consistent and easier to manage. Whether you are just starting out or already deploying apps daily, these are the little things that keep your cluster stable and your future self sane.


3. "How Google Does It: Building the largest known Kubernetes cluster, with 130,000 nodes" by Besher Massri and Maciek Różacki, Google.
At Google Cloud, we’re constantly pushing the scalability of Google Kubernetes Engine (GKE) so that it can keep up with increasingly demanding workloads — especially AI. GKE already supports massive 65,000-node clusters, and at KubeCon, we shared that we successfully ran a 130,000-node cluster in experimental mode — twice the number of nodes compared to the officially supported and tested limit. [..] In this blog, we take a look at the trends driving demand for these kinds of mega-clusters, and do a deep dive on the architectural innovations we implemented to make this extreme scalability a reality.


4. "93% Faster Next.js in (your) Kubernetes" by Matteo Collina, Platformatic.
We'll start by examining the complications of running this powerful framework in your own environment, and get under the hood (and I mean, down to the kernel) about why they happen. Then, we'll walk you through the approach we took with Watt to solve them, and what it means for you if you happen to run Next.js on any other Node.js CPU-bound workload on-prem.


5. "OpenPERouter -- Bringing EVPN to Kubernetes" by Mengxin Liu.
Recently, while researching EVPN as a multi-tenancy solution for physical networks, I discovered the open-source project OpenPERouter. It introduces the concept of EVPN into container networking, providing a new approach to achieving multi-tenancy in Kubernetes. This solution not only unifies software and hardware network architectures but also offers some compatibility with existing CNIs like Calico, which advertise routes via BGP.


6. "Kubernetes 1.35: Deep dive into new alpha features" by Kirill Kononovich, Palark.
The Kubernetes 1.35 release, scheduled for December 17th, has gift-wrapped a variety of experimental improvements designed to enhance infrastructure flexibility and security. In this overview, we focus on its Alpha features extending across a broad spectrum of tasks: from watch-based route controller reconciliation and the long-awaited Gang Scheduling for AI/ML workloads to the secrets field for passing Service Account tokens, mutable volume attach limits, and proxying API server requests to fix version skew.


7. "Kubernetes 1.35 - New security features" by Víctor Jiménez Cerrada, Sysdig.
Kubernetes 1.35 will be released soon, bringing 17 changes to its security features. It includes new validations, the deprecation of old technologies, and broader support for user namespaces, to name a few.


#articles
3🔥1
Kubernative by Palark | Kubernetes news and goodies
Ingress NGINX will be retired soon Another significant announcement made during KubeCon NA involved deprecation. Kubernetes SIG Network and the Security Response Committee declared that Ingress NGINX will be retired in March 2026. This Ingress controller…
Ingress NGINX retirement: helpful tools and resources

Tools and repos:
1. Ingress2gateway (we described it here before)
2. Gateway API Benchmarks lists and compares existing Gateway API implementations
3. Ingress Migration Kit is a new tool that generates Gateway API migration plans

Related posts and other activities from vendors, projects, and community:
1. Clarifications from a Gateway API maintainer
2. NGINX Inc (F5): blog post; live AMA with the NGINX team (December 10th and 11th); migration experience from a user
3. Isovalent / Cilium: blog post; migration experience from a user
4. Traefik: blog post; Ingress NGINX Migration tool from the company
5. HAProxy: blog post; migration assistance from the company
6. SUSE: blog post

#articles #tools #networking
👍5
Another bunch of interesting Kubernetes-related articles recently spotted online:

1. "It works on my cluster: a tale of two troubleshooters" by Liam Mackie, Octopus Deploy.
Kubernetes has a gift for making simple problems look complicated, and complicated problems look simple. When something breaks, you often see symptoms completely unrelated to the real cause of the problem. This leads to a problem I like to call “blaming the network team”, where problems end up being diagnosed by the wrong engineers for a given issue. [..] I’ve personally experienced this dichotomy during my time as an engineer, working on both software and infrastructure, so I’m going to tell a story from two perspectives.


2. "A Brief Deep-Dive into Attacking and Defending Kubernetes" by Alexis Obeng.
My main motivation for writing this was to better understand for myself how Kubernetes works and its attack surface. I was also inspired from talking to people in the field and realizing just how prominent Kubernetes is in corporate environments. Although I did not cover every single attack vector here, I still cover a large amount of topics in the hope that this will prove useful to others seeking to understand Kubernetes’ attack surface.


3. "Exploring Cloud Native projects in CNCF Sandbox. Part 5: 13 arrivals of January 2025" by Dmitry Shurupov, Palark.
Learn about the following new CNCF projects: Podman Container Tools and Podman Desktop, bootc, composefs, k0s, KubeFleet, SpinKube, container2wasm, Runme Notebooks for DevOps, SlimFaas, Tokenetes, CloudNativePG, and Drasi.


4. "The Real State of Helm Chart Reliability: Hidden Risks in 100+ Open‑Source Charts" by Prequel.
Prequel's reliability research team audited 105 popular Kubernetes Helm charts to reveal missing reliability safeguards. The average score was ~3.98/10. 48% (50 charts) rated "High Risk" (score ≤3/10). Only 17% (18 charts) were rated "Reliable" (≥7/10).


5. "Reclaiming underutilized GPUs in Kubernetes using scheduler plugins" by Lalit Somavarapha, Gernot Seidler, Srujana Reddy Attunuri (HPE).
The default Kubernetes preemption mechanism (DefaultPreemption) can evict lower-priority pods to make room for higher-priority ones. But it only considers priority — not actual utilization. Pods are treated equivalently from a preemption perspective when they share the same priority, regardless of their current utilization. We evaluated several existing approaches.


6. "How We Built Our Deployment Pipeline: GitOps, ArgoCD, and Kubernetes at Dodo Payments" by Ayush Agarwal, Dodo Payments.
The investment in GitOps pays off at a certain scale. Below that scale, simpler solutions work fine. For us, running a payment platform with strict requirements around security, auditability, and reliability — GitOps isn’t optional. It’s infrastructure.


#articles
🔥32👍1
Our latest selection of interesting Kubernetes-related articles recently spotted online:

1. "Kubernetes Rolling Updates for Reliable Deployments" by James Walker, Spacelift.
In this guide, we will explain the benefits of rolling updates, describe how they work, and provide detailed examples of their use. We’ll also compare how rolling updates stack up against other popular deployment strategies.


2. "Experimenting with Gateway API using kind" by Ricardo Katz, Red Hat.
This document will guide you through setting up a local experimental environment with Gateway API on kind. This setup is designed for learning and testing. It helps you understand Gateway API concepts without production complexity.


3. "Understanding the Ingress-NGINX Deprecation — Before You Migrate to the Gateway API" by Artem Lajko.
Most blog posts about the Ingress-NGINX deprecation are optimized for clicks, not for engineers who actually have to migrate production systems. You’ll find tiny demo setups, toy examples, and conclusions that fall apart the moment you apply them to an enterprise environment. That frustration is the reason this guide exists. This article is based on our real enterprise setup, built on top of the kubara framework. It documents how we approached the migration, what worked, what didn’t, and — just as important — what we decided not to migrate.


4. "Lazy-Pulling Container Images: A Deep Dive Into OCI Seekability" by Zain Malik.
This post starts with why the problem is harder than it looks at the byte level, then surveys the major approaches and what they trade off. The core of the post is a hands-on experiment: I deploy an in-cluster registry, convert images to eStargz, patch containerd with a custom snapshotter, and measure something nobody benchmarks properly. Not just pull time, but readiness, the moment a container can actually serve its first request.


5. "Kernel Archaeology: Why 36 CPUs Crash Cilium But 32 Don’t" by Pierre Magne, Qonto.
[..] The deployment looked successful. But then, over several weeks, we noticed sporadic crashes — roughly one Cilium agent per week, completely unrecoverable without restarting the entire node. No clear pattern, no obvious trigger. Rare enough to be hard to reproduce, but severe enough to block production deployment.


6. "Speeding Up FluxCD Development Without Remote Pushes: Local Git Reconciliation" by Marco Boss.
[..] I started looking for a way to develop and validate manifests locally, while still having full access to Flux features, and without resorting to brittle hacks or partial simulations. In this post, I’ll walk you through the approach I ended up with and show you how to run Flux locally in a way that actually feels usable for day-to-day development.


#articles
👍32
We’re back online after a short break, and here comes our latest selection of interesting Kubernetes-related articles recently spotted online:

1. "Making Harbor production-ready: Essential considerations for deployment" by Dhruv Tyagi and Daniel Jiang, Broadcom.
While deploying Harbor is straightforward, making it production-ready requires careful consideration of several key aspects. This blog outlines critical factors to ensure your Harbor instance is robust, secure, and scalable for production environments.


2. "Kubernetes Strategic Merge Patch" by Brian Grant, ConfigHub.
If you’ve used Kubernetes kubectl apply, server-side apply, or kustomize, then you may have encountered the “strategic merge patch” feature. “Strategic merge patch” is a mouthful. What does it mean? In what sense is it “strategic”? Why does it exist?


3. "Containers Are Not Automatically Secure" by Luca Cavallin.
Containers changed how we package and ship software, but they did not rewrite the basic security rules. Trust boundaries, privilege, and attack surface are all still there. That's one of the things I learned while digging into container security, partly from Liz Rice's Container Security and partly from spending time with the Linux pieces underneath.


4. "How Reddit Migrated Petabyte-Scale Kafka from EC2 to Kubernetes" by Alex Xu.
The Reddit Engineering Team completed one of the most demanding infrastructure migrations in the company’s history. It moved its entire Apache Kafka fleet, comprising over 500 brokers and more than a petabyte of live data, from Amazon EC2 virtual machines onto Kubernetes. The migration was done with zero downtime and without asking a single client application to change how it connected to Kafka. In this article, we will look at the breakdown of this migration, the challenges the engineering team faced, and how they achieved their goal of a successful migration.


5. "Running Agents on Kubernetes with Agent Sandbox" by Janet Kuo and Justin Santa Barbara.
[..] as AI evolves from short-lived inference requests to long-running, autonomous agents, we are seeing the emergence of a new operational pattern. AI agents, by contrast, are typically isolated, stateful, singleton workloads. [..] SIG Apps is developing agent-sandbox. The project introduces a declarative, standardized API specifically tailored for singleton, stateful workloads like AI agent runtimes.


6. "A one-line Kubernetes fix that saved 600 hours a year" by Braxton Schafer, Cloudflare.
Every time we restarted Atlantis, the tool we use to plan and apply Terraform changes, we’d be stuck for 30 minutes waiting for it to come back up. No plans, no applies, no infrastructure changes for any repository managed by Atlantis. With roughly 100 restarts a month for credential rotations and onboarding, that added up to over 50 hours of blocked engineering time every month, and paged the on-call engineer every time. This was ultimately caused by a safe default in Kubernetes that had silently become a bottleneck as the persistent volume used by Atlantis grew to millions of files. Here’s how we tracked it down and fixed it with a one-line change.


#articles
👍7