Your customers should never be your monitoring system.
mkdev helps teams move from basic alerts to real observability: telemetry, tracing, debugging, and alerts that notify the right people without creating noise.
Check out the page and schedule a call: https://mkdev.me/b/consulting/observability
mkdev helps teams move from basic alerts to real observability: telemetry, tracing, debugging, and alerts that notify the right people without creating noise.
Check out the page and schedule a call: https://mkdev.me/b/consulting/observability
mkdev.me
Monitoring & Observability consulting for business | mkdev
Schedule a call to receive the Monitoring & Observability consultation from industry experts
AI explainability is not one problem. It is several problems wearing the same name.
A data scientist wants to know why a model behaves a certain way. A business leader wants to know whether the system creates value without unacceptable risk. A user wants to know whether they can rely on the output. An affected person wants to know whether they can challenge a decision. A regulator wants to know whether the company can demonstrate compliance and accountability.
The same explanation will not satisfy all of them.
This is why businesses need to treat explainability as part of AI system design, not as a marketing feature. Before choosing a model or buying a vendor solution, teams should define who needs explanations, what decisions need to be explained, and whether those explanations are meant for debugging, trust, consent, appeal, or liability.
In 2026, “AI explainability” should not be a checkbox. It should be a business requirement with clear stakeholders and clear limits.
https://mkdev.me/posts/explaining-ai-explainability-the-current-reality-for-businesses
A data scientist wants to know why a model behaves a certain way. A business leader wants to know whether the system creates value without unacceptable risk. A user wants to know whether they can rely on the output. An affected person wants to know whether they can challenge a decision. A regulator wants to know whether the company can demonstrate compliance and accountability.
The same explanation will not satisfy all of them.
This is why businesses need to treat explainability as part of AI system design, not as a marketing feature. Before choosing a model or buying a vendor solution, teams should define who needs explanations, what decisions need to be explained, and whether those explanations are meant for debugging, trust, consent, appeal, or liability.
In 2026, “AI explainability” should not be a checkbox. It should be a business requirement with clear stakeholders and clear limits.
https://mkdev.me/posts/explaining-ai-explainability-the-current-reality-for-businesses
mkdev.me
AI Explainability: Complexity, Trust & Business Impact
In the second article of his explainable AI series, Paul Larsen looks at what today’s XAI tools really deliver for different stakeholders—from users to regulators—and where they still fall short for trust, liability and high-risk decisions.
Service Mesh can make developers’ lives easier — but it’s not magic dust for every Kubernetes setup. It shines when services talk to each other a lot, and when teams agree what should be handled by infrastructure and what should stay in code.
Read the article: https://mkdev.me/posts/do-developers-need-service-mesh
Read the article: https://mkdev.me/posts/do-developers-need-service-mesh
mkdev.me
When Does Service Mesh Matter for Developers? | mkdev
Service Mesh sounds like something focused on the infrastructure automation. It’s features around traffic management, observability and security are definitely exciting for any Infrastructure Engineer. But what about developers? Is there any reason for them…
😱1
Love mkdev illustrations? You can now get many of them on t-shirts, mugs, and other items in the mkdev store — including some exclusive designs you won’t find anywhere else.
DevOps and Cloud swag, the mkdev way. Shop the mkdev store now: https://store.mkdev.me/#!/all
DevOps and Cloud swag, the mkdev way. Shop the mkdev store now: https://store.mkdev.me/#!/all
👍1
A surprising number of AWS accounts still run without the basic cost-management features fully enabled.
No hourly cost visibility. No resource-level data. No meaningful budgets. No anomaly alerts. No regular review of rightsizing recommendations.
Then the bill arrives, and everyone starts investigating backwards.
The better approach is simple: set up the cost observability layer before you need it. Enable Cost Explorer. Add granular data where it makes sense. Use Cost Optimization Hub and Compute Optimizer for recommendations. Configure AWS Budgets. Turn on Cost Anomaly Detection.
These steps will not replace a proper AWS audit, but they create the minimum visibility needed to make good decisions. Cloud bills should not be a monthly surprise. They should be a monitored system.
Details in the article: https://mkdev.me/posts/getting-started-with-aws-cost-optimization-6-steps-to-get-the-cloud-bill-under-control
No hourly cost visibility. No resource-level data. No meaningful budgets. No anomaly alerts. No regular review of rightsizing recommendations.
Then the bill arrives, and everyone starts investigating backwards.
The better approach is simple: set up the cost observability layer before you need it. Enable Cost Explorer. Add granular data where it makes sense. Use Cost Optimization Hub and Compute Optimizer for recommendations. Configure AWS Budgets. Turn on Cost Anomaly Detection.
These steps will not replace a proper AWS audit, but they create the minimum visibility needed to make good decisions. Cloud bills should not be a monthly surprise. They should be a monitored system.
Details in the article: https://mkdev.me/posts/getting-started-with-aws-cost-optimization-6-steps-to-get-the-cloud-bill-under-control
mkdev.me
AWS Cost Optimization: 6 Steps to Control Your Cloud Bill
From AWS's tool to analyze your costs not being enabled by default to configuring AWS Cost Anomaly Detection, there's a lot to do to get started with your AWS cost optimization. Here are the first 6 steps for you to make.
Running Kubernetes on-prem, in the cloud, or both? mkdev’s Kubernetes Audit & Assessment looks at operations, security, service mesh, observability, capacity and how your apps can actually benefit from Kubernetes. Check out the page and schedule a call: https://mkdev.me/b/audits/kubernetes-audit-assessment
mkdev.me
Kubernetes Audit and Assessment | mkdev audits for business
As part of Kubernetes Audit and Assessment, we take a deep review of your setup from security and high availability to cost and automation. We provide you with a detailed report on all angles of Kubernetes usage, from cluster operations to developer experience
AI image generation becomes much more interesting when you stop thinking about it as a standalone feature. The model is only one part of the system. The rest is context management, iteration, file handling, parameters, quality checks, and the ability to repeat the process without losing your mind.
That’s what this article is about. Kirill took Nano Banana Pro, later added GPT Image 2, and wrapped both into Claude Code Skills. This allowed Claude Code to generate images through a small Python script, inspect the outputs, notice problems, and continue improving the result.
For product teams, this is where the practical value starts. You can brainstorm app icons, create mascot variations, generate high-resolution visuals, localize screenshots, and explore many directions without manually restarting the process every time.
The broader lesson is simple: AI tools become dramatically more useful when they are connected to real workflows. The future is not just “better prompts”. It is small, composable tools that let AI agents actually do the work around the model.
Read the full post here: https://mkdev.me/posts/unlimited-image-generation-with-nano-banana-pro-gpt-image-2-and-claude-code-skills
That’s what this article is about. Kirill took Nano Banana Pro, later added GPT Image 2, and wrapped both into Claude Code Skills. This allowed Claude Code to generate images through a small Python script, inspect the outputs, notice problems, and continue improving the result.
For product teams, this is where the practical value starts. You can brainstorm app icons, create mascot variations, generate high-resolution visuals, localize screenshots, and explore many directions without manually restarting the process every time.
The broader lesson is simple: AI tools become dramatically more useful when they are connected to real workflows. The future is not just “better prompts”. It is small, composable tools that let AI agents actually do the work around the model.
Read the full post here: https://mkdev.me/posts/unlimited-image-generation-with-nano-banana-pro-gpt-image-2-and-claude-code-skills
mkdev.me
Unlimited Image Generation: Nano Banana Pro & GPT Image 2
Nano Banana Pro and OpenAI's GPT Image 2 are top-tier image gen models right now — and Kirill wired both into Claude Skills. 100+ icon iterations, 4K control, self-critiquing generations, and sane context handling. $45 well spent.
Want to pass the Certified Kubernetes Administrator exam?
Don’t try to memorize Kubernetes. Learn how it works, practice real tasks, master kubectl, reuse YAML when possible, and make sure your basic Linux skills are solid.
This video explains 6 simple but important tips that can save you time during the exam.
Watch the full video and prepare smarter:
https://www.youtube.com/watch?v=Hk07gXekQ1c
Don’t try to memorize Kubernetes. Learn how it works, practice real tasks, master kubectl, reuse YAML when possible, and make sure your basic Linux skills are solid.
This video explains 6 simple but important tips that can save you time during the exam.
Watch the full video and prepare smarter:
https://www.youtube.com/watch?v=Hk07gXekQ1c
A lot has changed in cloud security. The basics have not.
AI workloads, Kubernetes platforms, multi-cloud setups, serverless services, and managed databases all add complexity. But the same core questions still decide whether your environment is reasonably secure:
Who has access? What data is sensitive? What is encrypted? What is logged? Who owns each security responsibility? How often are settings reviewed? What happens during an incident?
That is exactly what a good cloud security checklist should force you to answer.
We put together 7 essential steps for reducing cloud security risk, from data classification and IAM to monitoring, automated audits, and tested response plans.
If your cloud setup has grown faster than your security process, this is a good place to start.
https://mkdev.me/posts/cloud-security-checklist-7-essential-steps
AI workloads, Kubernetes platforms, multi-cloud setups, serverless services, and managed databases all add complexity. But the same core questions still decide whether your environment is reasonably secure:
Who has access? What data is sensitive? What is encrypted? What is logged? Who owns each security responsibility? How often are settings reviewed? What happens during an incident?
That is exactly what a good cloud security checklist should force you to answer.
We put together 7 essential steps for reducing cloud security risk, from data classification and IAM to monitoring, automated audits, and tested response plans.
If your cloud setup has grown faster than your security process, this is a good place to start.
https://mkdev.me/posts/cloud-security-checklist-7-essential-steps
mkdev.me
Reduce Cloud Security Risks with 7 Essential Steps | mkdev
Cloud security isn't just a tech problem—it's a human one, with 99% of breaches caused by simple user mistakes. In this article, Kirill Shirinkin offers a no-nonsense 7-step checklist that any team can follow to dramatically reduce cloud security risks, plus…
Good infrastructure code should be like good application code: clear, tested, versioned and automatically deployed.
That’s the mindset behind mkdev’s Infrastructure as Code & GitOps consulting.
Check out the page and schedule a call: https://mkdev.me/b/consulting/iac
That’s the mindset behind mkdev’s Infrastructure as Code & GitOps consulting.
Check out the page and schedule a call: https://mkdev.me/b/consulting/iac
mkdev.me
Infrastructure as Code & GitOps consultation for business | mkdev
Schedule a call to receive the Infrastructure Deployment consultation for Advanced level developers from industry experts
Prompt engineering is not security engineering.
This is one of the hardest lessons for product managers building with GenAI. A system prompt may look like a clean set of rules, but it is not the same as traditional application logic. It does not guarantee behavior. It is more like a very strongly worded suggestion to the model.
That matters when your AI feature is exposed to users. A customer-facing assistant might be told not to reveal sensitive data, not to generate illegal content, not to override company policies, and not to take dangerous actions. But malicious users can still try to bypass those instructions through jailbreaks or prompt injection attacks.
The business impact is not theoretical. A badly controlled AI system can create reputational damage, legal exposure, data leakage, or operational incidents. For PMs, that means AI features need proper boundaries beyond “we wrote a careful prompt.”
Good GenAI product management means asking: What can the model access? What actions can it trigger? What happens if the user tries to manipulate it? What checks exist outside the model itself?
We covered the practical risks product managers should understand in this article.
Read it here: https://mkdev.me/posts/genai-security-risks-for-product-managers-dd73bdc2-4f2e-4227-93b3-375da081d906
This is one of the hardest lessons for product managers building with GenAI. A system prompt may look like a clean set of rules, but it is not the same as traditional application logic. It does not guarantee behavior. It is more like a very strongly worded suggestion to the model.
That matters when your AI feature is exposed to users. A customer-facing assistant might be told not to reveal sensitive data, not to generate illegal content, not to override company policies, and not to take dangerous actions. But malicious users can still try to bypass those instructions through jailbreaks or prompt injection attacks.
The business impact is not theoretical. A badly controlled AI system can create reputational damage, legal exposure, data leakage, or operational incidents. For PMs, that means AI features need proper boundaries beyond “we wrote a careful prompt.”
Good GenAI product management means asking: What can the model access? What actions can it trigger? What happens if the user tries to manipulate it? What checks exist outside the model itself?
We covered the practical risks product managers should understand in this article.
Read it here: https://mkdev.me/posts/genai-security-risks-for-product-managers-dd73bdc2-4f2e-4227-93b3-375da081d906
mkdev.me
GenAI Security Risks for Product Managers
The third article of this series by Paul Larsen warns product managers about the major cybersecurity risks of GenAI—like data leaks, prompt jailbreaks, and injection attacks—and offers practical steps to keep AI use productive without endangering company…
Trying to reduce your Google Cloud Run costs?
Start with the less obvious places: VPC connectors, direct egress, and whether CPU really needs to be always allocated.
We explain both tips with a real billing example here: https://mkdev.me/posts/2-simple-tips-to-reduce-your-google-cloud-run-costs
Start with the less obvious places: VPC connectors, direct egress, and whether CPU really needs to be always allocated.
We explain both tips with a real billing example here: https://mkdev.me/posts/2-simple-tips-to-reduce-your-google-cloud-run-costs
mkdev.me
Cut Google Cloud Run Costs: 2 Proven Tips | mkdev
Discover key strategies to optimize costs with Google Cloud Run, including committed use discounts, reducing Compute Engine expenses through VPC connector adjustments, and managing CPU allocation. Learn how to cut cloud costs by up to 50% with practical examples.
This free course is all about understanding ArgoCD from the ground up. We will look at what ArgoCD does, why it matters, and how it organizes projects, applications, and deployments through its main features.
Articles: https://mkdev.me/posts/what-is-argo-cd-and-why-would-you-need-gitops
Video: https://www.youtube.com/playlist?list=PLozcbFx8FoPHUHoKfuSrkMO0ulZD-CHHu
Articles: https://mkdev.me/posts/what-is-argo-cd-and-why-would-you-need-gitops
Video: https://www.youtube.com/playlist?list=PLozcbFx8FoPHUHoKfuSrkMO0ulZD-CHHu
mkdev.me
ArgoCD & GitOps: Lightning Course for Kubernetes | mkdev
Dive into the introductory lesson of our ArgoCD Lightning Course. Designed for Kubernetes and Helm users, this article outlines ArgoCD's basics, explaining its role as a declarative GitOps deployment tool. Understand the difference between imperative and…
At small scale, microservices feel manageable.
At larger scale, every service needs to find other services, communicate securely, expose useful telemetry, support traffic shifting, and follow consistent authorization rules. Doing this separately in every application quickly becomes a mess.
That is where service mesh comes in. It gives platform teams a common layer for service-to-service communication, usually through a control plane and a data plane made of proxies.
Google Cloud’s Anthos Service Mesh, now Cloud Service Mesh, is one way to bring this model into GKE. It can simplify parts of the operational story, especially if you want managed mesh capabilities. But it also introduces important tradeoffs around sidecars, Envoy, Istio APIs, GKE Dataplane V2, eBPF, and Cilium.
The article is a good reminder that “managed” does not mean “you do not need to understand it”.
In 2026, service mesh is still powerful. It is also still something you should adopt deliberately.
https://mkdev.me/posts/is-google-cloud-anthos-service-mesh-a-mess
At larger scale, every service needs to find other services, communicate securely, expose useful telemetry, support traffic shifting, and follow consistent authorization rules. Doing this separately in every application quickly becomes a mess.
That is where service mesh comes in. It gives platform teams a common layer for service-to-service communication, usually through a control plane and a data plane made of proxies.
Google Cloud’s Anthos Service Mesh, now Cloud Service Mesh, is one way to bring this model into GKE. It can simplify parts of the operational story, especially if you want managed mesh capabilities. But it also introduces important tradeoffs around sidecars, Envoy, Istio APIs, GKE Dataplane V2, eBPF, and Cilium.
The article is a good reminder that “managed” does not mean “you do not need to understand it”.
In 2026, service mesh is still powerful. It is also still something you should adopt deliberately.
https://mkdev.me/posts/is-google-cloud-anthos-service-mesh-a-mess
mkdev.me
Google Cloud Anthos: Demystifying Service Mesh | mkdev
Explore how Google Cloud harnesses the power of service mesh for managing microservices in GCP. Pablo Inigo Sanchez's article breaks down the complexity of implementing service mesh in GKE using Anthos, highlighting the use of Istio and the potential of eBPF…
Infrastructure problems rarely announce themselves early. mkdev audits look into your cloud, Kubernetes and security setup, identify what needs improvement, and turn it into a practical action plan for your team. Check out the page and schedule a call: https://mkdev.me/b/audits
mkdev.me
mkdev audits and assessments for business
A full scope audit and analysis of your infrastructure and applications, including cost analysis and data protection, with a detailed report on how to improve your infrastructure
ClickOps is annoying when you have one project. It becomes dangerous when you have many.
That applies to OpenAI as much as it applies to AWS, Kubernetes or any other infrastructure platform. Once you have multiple teams, multiple projects, service accounts, API keys, limits and access rules, manual configuration becomes a source of inconsistency.
The Open Source Terraform Provider for OpenAI was built around that problem. It brings OpenAI administration into Terraform, so teams can manage resources in code instead of relying on screenshots, tribal knowledge and “who created this key?” conversations.
There is also a more experimental side: using OpenAI platform APIs inside Terraform workflows, including model responses and image generation, and even combining them with cloud providers like AWS.
It is a fun example, but the larger point is serious: GenAI platforms need the same engineering discipline as the rest of your infrastructure.
https://mkdev.me/posts/announcing-the-open-source-terraform-provider-for-openai
That applies to OpenAI as much as it applies to AWS, Kubernetes or any other infrastructure platform. Once you have multiple teams, multiple projects, service accounts, API keys, limits and access rules, manual configuration becomes a source of inconsistency.
The Open Source Terraform Provider for OpenAI was built around that problem. It brings OpenAI administration into Terraform, so teams can manage resources in code instead of relying on screenshots, tribal knowledge and “who created this key?” conversations.
There is also a more experimental side: using OpenAI platform APIs inside Terraform workflows, including model responses and image generation, and even combining them with cloud providers like AWS.
It is a fun example, but the larger point is serious: GenAI platforms need the same engineering discipline as the rest of your infrastructure.
https://mkdev.me/posts/announcing-the-open-source-terraform-provider-for-openai
mkdev.me
Introducing Open Source OpenAI Terraform Provider | mkdev
Tired of managing OpenAI configs by hand? mkdev just open-sourced a Terraform provider that lets you automate everything—from API keys to generative AI workflows—all as code. Read the article to see how it turns infrastructure and AI into one seamless experience.
Public IP, private IP, Cloud Run, Cloud SQL, Serverless VPC Connector… Google Cloud networking can get confusing fast. This video breaks down one practical setup step by step. Watch it now.
https://www.youtube.com/watch?v=MeynQIt3TD8
https://www.youtube.com/watch?v=MeynQIt3TD8
YouTube
How to connect Cloud Run and Cloud SQL internally
In this video we are going to learn how to change a Cloud SQL that it is using an external IP to connect to Cloud Run to an internal IP and a serverless Cloud Connector.
* https://mkdev.me/b/audits
*
If you or your company need consulting and training…
* https://mkdev.me/b/audits
*
If you or your company need consulting and training…
From DevOps and Cloud to AI: get the latest thoughts by Pablo and Kirill on all the news topics + a collection of personally curated interesting links, every other week in your Inbox! Subscribe to mkdev dispatch here: https://mkdev.me/categories/newsletter
mkdev.me
mkdev Dispatch – Bi-Weekly Newsletter on DevOps, AI & Cloud Native
Subscribe to mkdev dispatch, your go-to bi-weekly newsletter for insightful articles and essays on DevOps, Public Cloud, and Cloud Native technologies.
Cloud cost optimization usually starts with quick wins: delete unused resources, rightsize oversized instances, clean up old snapshots, shut down non-production environments outside working hours, and add budget alerts.
But the bigger savings usually come later, when teams start treating cost as an architectural constraint.
That means choosing the right purchase model for steady workloads, using spot capacity where interruptions are acceptable, moving rarely accessed data to cheaper storage tiers, avoiding unnecessary cross-region traffic, and designing systems that scale with demand instead of running at peak capacity all the time.
The goal is not to spend as little as possible. The goal is to stop paying for waste while keeping reliability and performance where they need to be.
https://mkdev.me/posts/the-ultimate-guide-to-cloud-cost-optimization
But the bigger savings usually come later, when teams start treating cost as an architectural constraint.
That means choosing the right purchase model for steady workloads, using spot capacity where interruptions are acceptable, moving rarely accessed data to cheaper storage tiers, avoiding unnecessary cross-region traffic, and designing systems that scale with demand instead of running at peak capacity all the time.
The goal is not to spend as little as possible. The goal is to stop paying for waste while keeping reliability and performance where they need to be.
https://mkdev.me/posts/the-ultimate-guide-to-cloud-cost-optimization
mkdev.me
Cloud Cost Optimization: Proven Tactics for Savings | mkdev
Most companies are bleeding money in the cloud without realizing it — but it doesn’t have to be this way. In this article, Kirill Shirinkin breaks down practical, no-nonsense strategies that can cut cloud costs by up to 72% without sacrificing performance.…
Cloud projects don’t fail because AWS or GCP lack options. They fail because there are too many options, too many shortcuts and not enough clarity. mkdev helps teams design practical cloud solutions that fit their business. Check out the page and schedule a call: https://mkdev.me/b/consulting/public-cloud
mkdev.me
Public Cloud (AWS and GCP) Consulting | mkdev
Schedule a call to get a first consultion about your AWS or GCP project with us