Kube Builders
1.55K subscribers
814 photos
154 videos
1.65K links
News and links on infrastructure and building Kubernetes clusters curated by the @Learnk8s team
Download Telegram
Gonzo lets you use a terminal UI to stream and analyse logs in real time, with support for OpenTelemetry (OTLP), AI-powered insights, heatmaps and advanced filtering.

More: https://ku.bz/zLysqtt93
Forwarded from KubeFM
Media is too big
VIEW IN TELEGRAM
Sai Vennam, Principal Solutions Architect at Amazon Web Services (AWS), breaks down three Kubernetes tools that are reshaping how teams handle advanced operational challenges. He covers

1. Kueue for intelligent GPU job scheduling that moves beyond Kubernetes' default first-come-first-served approach
2. Karpenter's node overlays for customizing instance selection based on organizational discount plans and preferences
3. Kro for composing complex infrastructure as YAML without custom controllers.

Watch the full interview: https://ku.bz/MgHpbXg4Y
This article explains the challenges of monitoring Kubernetes CronJobs compared to traditional cron, covers common failure modes such as resource limitations and silent failures, and recommends heartbeat monitoring as an alternative to Prometheus.

More: https://ku.bz/ZfvllhXQC
Forwarded from KubeFM
Media is too big
VIEW IN TELEGRAM
Grzegorz Głąb, Kubernetes Engineer at Cloud Kitchens, explains how his team solved mysterious network packet drops affecting I/O-sensitive workloads.

He describes diagnosing that VM freeze events (when machines move between physical hosts) were causing network interrupts to be handled by just 1-2 CPU cores instead of being distributed across all available cores, creating a processing bottleneck.

The team implemented a two-part automated solution:

- A node-level detector that identifies when less than half of the CPU cores are handling network interrupts
- A DaemonSet that automatically restarts the IRQ balance service on affected nodes

This automation has successfully prevented packet drops for over a year, demonstrating how Kubernetes can be extended to solve infrastructure-level problems that impact application performance.

Watch the full episode: https://kube.fmhttps://ku.bz/yg_fkP0LN
KubeAttention is a machine learning-powered Kubernetes scheduler plugin that uses eBPF telemetry to detect noisy neighbor interference and place latency-sensitive workloads on optimal nodes.

More: https://ku.bz/S6KNy0_4h
Forwarded from LearnKube news
This media is not supported in your browser
VIEW IN TELEGRAM
InfraLens is a zero-instrumentation observability tool that uses eBPF to automatically discover and visualize service-to-service communication in Kubernetes clusters without requiring code changes or sidecars.

More: https://ku.bz/FljwMJsXy
Forwarded from KubeFM
Media is too big
VIEW IN TELEGRAM
Shyam Jeedigunta, Principal Engineer at Amazon Web Services (AWS), explains connectivity patterns for hybrid Kubernetes deployments where worker nodes run outside the core cluster network.

He covers the trade-offs between public internet connectivity and private networking solutions, focusing on maintaining reliability and performance while preserving security isolation.

Watch the full interview: https://ku.bz/m89tLbgcq
This article shows how running Nomad server control plane on OpenShift using StatefulSets manages distributed edge fleets where Kubernetes can't reach, while OpenShift handles server lifecycle, security, and observability automatically.

More: https://ku.bz/-5g5fZCYL
Forwarded from Kube Architect
Forecastle is a control panel which dynamically discovers and provides a launchpad to access applications deployed on Kubernetes.

More: https://ku.bz/1ZZZSgjLj
Forwarded from KubeFM
Media is too big
VIEW IN TELEGRAM
Vincent von Büren was refactoring an old Helm chart when he spotted a debug log line printing a Kubernetes ServiceAccount token to stdout — still running in production.

He decoded it: no audience restrictions, one-year expiry. "My stomach turned. I knew this could be a serious security incident."

In this episode, Vincent breaks down:

- What's actually inside a ServiceAccount JWT
- Why default tokens enable replay attacks
- Projected tokens — the solution that's been available since 1.20, but why most teams haven't switched
- Practical steps to reduce exposure

Watch (or listen to) it here: https://ku.bz/LTnB_Ntbc

🌟 This episode is brought to you by LearnKube — comprehensive Kubernetes training. https://learnkube.com/training

With @Birthmarkb
CronJob Guardian monitors Kubernetes CronJobs with dead-man's switch detection, SLA tracking for success rates and duration regressions, intelligent alerting via Slack/PagerDuty/webhook/email, and a built-in web dashboard with charts and metrics export.

More: https://ku.bz/N2-98L3pg
Forwarded from Kube Careers
This week's 6 Kubernetes jobs that offer VISA sponsorships are:

Machine Learning Engineer with Anthropic
💰 $350.71K to $851.73K a year
Hybrid in Zurich, CH, On-site in San Francisco, CA, USA
https://ku.bz/8QWBc6mRK

Site Reliability Engineer with OpenAI
💰 $230K to $490K a year
On-site in San Francisco, CA, USA
https://ku.bz/qZFG_pnlB

Platform Engineer with The San Francisco Compute Company
💰 $250K to $325K a year
Remote from the United States of America
https://ku.bz/Qqg1zYQzR

DevOps Engineer with Parloa
💰 $225K to $335K a year
Remote from the United States of America, Hybrid in Berlin, DE; Munich, DE
https://ku.bz/n4xTCdHsz

👉 Browse 5345 jobs on Kube Careers https://kube.careers
Forwarded from LearnKube news
This week on Learn Kubernetes Weekly 177:

What Happens When You Run Java at Scale on Kubernetes
🚀 From Push to Production: Our Deployment Pipeline with Argo CD
From Minutes to Seconds: How I Eliminated Kubernetes Image Pull Delays
🏕️ Nomad on OpenShift: The Case for the Control Plane
🔬 Deep Dive: The Linkerd Destination Service

Read it now: https://kube.today/issues/177

⭐️ This newsletter is brought to you by Spectro Cloud, helping you scale K8s infrastructure for AI workloads — from cloud to edge https://ku.bz/JD0dS5lhZ
Forwarded from KubeFM
Media is too big
VIEW IN TELEGRAM
Molly Sheets, Director of Engineering, Kubernetes at Zynga, challenges the conventional wisdom that slower deployments are safer deployments. She argues that intentionally slowing down Kubernetes deployments through manual approval gates actually makes systems less resilient, not more secure.

Drawing from the research in "Accelerate" and DORA metrics, Molly explains how external approvers can introduce more risk than allowing teams to deploy faster to production. In the Kubernetes context specifically, she emphasizes that the architecture should focus on isolation between applications, enabling teams to release independently without affecting others.

Her core philosophy: "break things, fix it permanently, and keep moving on" with smaller, faster deployments.

Watch the full episode: https://ku.bz/Rmpl8948_
This media is not supported in your browser
VIEW IN TELEGRAM
IncidentFox automates incident investigation with AI agents using 178+ tools for Kubernetes, AWS, and Grafana, featuring RAPTOR knowledge base for runbooks, alert correlation reducing noise by 85-95%, and Slack/GitHub/PagerDuty integrations.

More: https://ku.bz/wTP3Kbtjs
Forwarded from KubeFM
Media is too big
VIEW IN TELEGRAM
Topology spread constraints are widely used, but most teams don't know the edge cases that can silently break their HA setup.

Jason Deal covers two:

1. TSCs are only evaluated at scheduling time, so they can drift as your cluster churns. Using Descheduler helps enforce conformance long-term.
2. During a rolling deployment, TSCs can match against pods from both the old and new ReplicaSet simultaneously — causing skew violations mid-rollout. The fix: use matchLabelKeys to scope the constraint to just the current pod template hash.



Watch the full interview: https://ku.bz/1_-DTgLsg
Forwarded from KubeFM
Media is too big
VIEW IN TELEGRAM
On-prem Kubernetes means you own all of it — and that adds up fast.

Raglin Anthony lists the day-to-day pain: control plane patching and upgrades, DNS management across multi-clusters, certificate expiry stalling operations, fragile IAM built on static kubeconfig files, and observability tools running inside the clusters they're supposed to be watching.

Each one is manageable in isolation. Together, they're a full-time job.



Watch the full interview: https://ku.bz/2XqMJnLVx
Forwarded from LearnKube news
Kor is a tool to discover unused Kubernetes resources.

Currently, Kor can identify and list unused:

- ConfigMaps
- Secrets
- Services
- ServiceAccounts
- Deployments
- Statefulsets
- Roles

More: https://ku.bz/J7zwN_LWt
Endpoint-Monitoring Operator probes HTTP/JSON, TCP, DNS, ICMP, Trino, and OpenSearch endpoints via a simple CRD, with built-in Slack and email alerting.

More: https://ku.bz/NqnYpDsKW
This tutorial teaches how to set up a local DNS server specifically for demo environments using dnsmasq and Docker containers.

More: https://ku.bz/r6rbLZ-dH
Forwarded from KubeFM
Media is too big
VIEW IN TELEGRAM
Rohit Agrawal from Databricks on replacing Kubernetes networking with a proxy-less, client-side load balancing system and eliminating 20-30% over-provisioning across hundreds of services.

You will learn:

- Why KubeProxy's L4 routing breaks down for gRPC: it picks a backend once per connection, not per request
- How Databricks built an Endpoint Discovery Service that streams real-time pod metadata to every client
- How zone-aware spillover cuts cross-AZ costs without sacrificing availability
- Why CPU-based routing failed and what signals to use instead

Watch (or listen to) it here: https://ku.bz/y803JMhBk

🌟 Sponsored by LearnKube — Kubernetes training, online or in-person. https://learnkube.com/training

With @Birthmarkb