mkdev
419 subscribers
861 photos
179 videos
1.11K links
Download Telegram
In the 92nd mkdev dispatch Kirill explains why AWS ECS Express Mode is a dissapointment. Subscribe to our bi-weekly newsletter where we talk about all things DevOps, Cloud and AI: https://mkdev.me/posts/aws-ecs-express-mode-is-a-dissapointment-92
The natural evolution of server management usually looks like this: first, you SSH into one machine and install everything manually. Then you save the commands into shell scripts. Then you add staging, production, load balancers, databases, monitoring agents, firewalls, SSH hardening and user accounts.

And suddenly your “simple scripts” become a pile of infrastructure folklore.

Configuration management tools solve this by moving from “run these commands” to “this is the state the server should be in.” The tool figures out what needs to change and can safely re-apply the configuration again and again.

That last part matters a lot. Infrastructure is not configured once. It drifts. People patch things manually. Emergency fixes happen. Defaults change. Compliance requirements evolve. In 2026, configuration management is still relevant because drift is still real.

Whether you use Ansible, Puppet, Chef or another tool, the core idea remains the same: make infrastructure configuration repeatable, reviewable and recoverable.

More in the mkdev article: https://mkdev.me/posts/what-is-configuration-management-and-why-you-need-ansible-chef-puppet-and-others
Your customers should never be your monitoring system.

mkdev helps teams move from basic alerts to real observability: telemetry, tracing, debugging, and alerts that notify the right people without creating noise.

Check out the page and schedule a call: https://mkdev.me/b/consulting/observability
AI explainability is not one problem. It is several problems wearing the same name.

A data scientist wants to know why a model behaves a certain way. A business leader wants to know whether the system creates value without unacceptable risk. A user wants to know whether they can rely on the output. An affected person wants to know whether they can challenge a decision. A regulator wants to know whether the company can demonstrate compliance and accountability.

The same explanation will not satisfy all of them.

This is why businesses need to treat explainability as part of AI system design, not as a marketing feature. Before choosing a model or buying a vendor solution, teams should define who needs explanations, what decisions need to be explained, and whether those explanations are meant for debugging, trust, consent, appeal, or liability.

In 2026, “AI explainability” should not be a checkbox. It should be a business requirement with clear stakeholders and clear limits.

https://mkdev.me/posts/explaining-ai-explainability-the-current-reality-for-businesses
Service Mesh can make developers’ lives easier — but it’s not magic dust for every Kubernetes setup. It shines when services talk to each other a lot, and when teams agree what should be handled by infrastructure and what should stay in code.

Read the article: https://mkdev.me/posts/do-developers-need-service-mesh
😱1
Love mkdev illustrations? You can now get many of them on t-shirts, mugs, and other items in the mkdev store — including some exclusive designs you won’t find anywhere else.

DevOps and Cloud swag, the mkdev way. Shop the mkdev store now: https://store.mkdev.me/#!/all
👍1
A surprising number of AWS accounts still run without the basic cost-management features fully enabled.

No hourly cost visibility. No resource-level data. No meaningful budgets. No anomaly alerts. No regular review of rightsizing recommendations.

Then the bill arrives, and everyone starts investigating backwards.

The better approach is simple: set up the cost observability layer before you need it. Enable Cost Explorer. Add granular data where it makes sense. Use Cost Optimization Hub and Compute Optimizer for recommendations. Configure AWS Budgets. Turn on Cost Anomaly Detection.

These steps will not replace a proper AWS audit, but they create the minimum visibility needed to make good decisions. Cloud bills should not be a monthly surprise. They should be a monitored system.

Details in the article: https://mkdev.me/posts/getting-started-with-aws-cost-optimization-6-steps-to-get-the-cloud-bill-under-control
Running Kubernetes on-prem, in the cloud, or both? mkdev’s Kubernetes Audit & Assessment looks at operations, security, service mesh, observability, capacity and how your apps can actually benefit from Kubernetes. Check out the page and schedule a call: https://mkdev.me/b/audits/kubernetes-audit-assessment
AI image generation becomes much more interesting when you stop thinking about it as a standalone feature. The model is only one part of the system. The rest is context management, iteration, file handling, parameters, quality checks, and the ability to repeat the process without losing your mind.

That’s what this article is about. Kirill took Nano Banana Pro, later added GPT Image 2, and wrapped both into Claude Code Skills. This allowed Claude Code to generate images through a small Python script, inspect the outputs, notice problems, and continue improving the result.

For product teams, this is where the practical value starts. You can brainstorm app icons, create mascot variations, generate high-resolution visuals, localize screenshots, and explore many directions without manually restarting the process every time.

The broader lesson is simple: AI tools become dramatically more useful when they are connected to real workflows. The future is not just “better prompts”. It is small, composable tools that let AI agents actually do the work around the model.

Read the full post here: https://mkdev.me/posts/unlimited-image-generation-with-nano-banana-pro-gpt-image-2-and-claude-code-skills
Want to pass the Certified Kubernetes Administrator exam?

Don’t try to memorize Kubernetes. Learn how it works, practice real tasks, master kubectl, reuse YAML when possible, and make sure your basic Linux skills are solid.

This video explains 6 simple but important tips that can save you time during the exam.

Watch the full video and prepare smarter:

https://www.youtube.com/watch?v=Hk07gXekQ1c
A lot has changed in cloud security. The basics have not.

AI workloads, Kubernetes platforms, multi-cloud setups, serverless services, and managed databases all add complexity. But the same core questions still decide whether your environment is reasonably secure:

Who has access? What data is sensitive? What is encrypted? What is logged? Who owns each security responsibility? How often are settings reviewed? What happens during an incident?

That is exactly what a good cloud security checklist should force you to answer.

We put together 7 essential steps for reducing cloud security risk, from data classification and IAM to monitoring, automated audits, and tested response plans.

If your cloud setup has grown faster than your security process, this is a good place to start.

https://mkdev.me/posts/cloud-security-checklist-7-essential-steps
Good infrastructure code should be like good application code: clear, tested, versioned and automatically deployed.

That’s the mindset behind mkdev’s Infrastructure as Code & GitOps consulting.

Check out the page and schedule a call: https://mkdev.me/b/consulting/iac
Prompt engineering is not security engineering.

This is one of the hardest lessons for product managers building with GenAI. A system prompt may look like a clean set of rules, but it is not the same as traditional application logic. It does not guarantee behavior. It is more like a very strongly worded suggestion to the model.

That matters when your AI feature is exposed to users. A customer-facing assistant might be told not to reveal sensitive data, not to generate illegal content, not to override company policies, and not to take dangerous actions. But malicious users can still try to bypass those instructions through jailbreaks or prompt injection attacks.

The business impact is not theoretical. A badly controlled AI system can create reputational damage, legal exposure, data leakage, or operational incidents. For PMs, that means AI features need proper boundaries beyond “we wrote a careful prompt.”

Good GenAI product management means asking: What can the model access? What actions can it trigger? What happens if the user tries to manipulate it? What checks exist outside the model itself?

We covered the practical risks product managers should understand in this article.

Read it here: https://mkdev.me/posts/genai-security-risks-for-product-managers-dd73bdc2-4f2e-4227-93b3-375da081d906
At small scale, microservices feel manageable.

At larger scale, every service needs to find other services, communicate securely, expose useful telemetry, support traffic shifting, and follow consistent authorization rules. Doing this separately in every application quickly becomes a mess.

That is where service mesh comes in. It gives platform teams a common layer for service-to-service communication, usually through a control plane and a data plane made of proxies.

Google Cloud’s Anthos Service Mesh, now Cloud Service Mesh, is one way to bring this model into GKE. It can simplify parts of the operational story, especially if you want managed mesh capabilities. But it also introduces important tradeoffs around sidecars, Envoy, Istio APIs, GKE Dataplane V2, eBPF, and Cilium.

The article is a good reminder that “managed” does not mean “you do not need to understand it”.

In 2026, service mesh is still powerful. It is also still something you should adopt deliberately.

https://mkdev.me/posts/is-google-cloud-anthos-service-mesh-a-mess
Infrastructure problems rarely announce themselves early. mkdev audits look into your cloud, Kubernetes and security setup, identify what needs improvement, and turn it into a practical action plan for your team. Check out the page and schedule a call: https://mkdev.me/b/audits
ClickOps is annoying when you have one project. It becomes dangerous when you have many.

That applies to OpenAI as much as it applies to AWS, Kubernetes or any other infrastructure platform. Once you have multiple teams, multiple projects, service accounts, API keys, limits and access rules, manual configuration becomes a source of inconsistency.

The Open Source Terraform Provider for OpenAI was built around that problem. It brings OpenAI administration into Terraform, so teams can manage resources in code instead of relying on screenshots, tribal knowledge and “who created this key?” conversations.

There is also a more experimental side: using OpenAI platform APIs inside Terraform workflows, including model responses and image generation, and even combining them with cloud providers like AWS.

It is a fun example, but the larger point is serious: GenAI platforms need the same engineering discipline as the rest of your infrastructure.

https://mkdev.me/posts/announcing-the-open-source-terraform-provider-for-openai