CatOps
5.09K subscribers
94 photos
5 videos
19 files
2.57K links
DevOps and other issues by Yurii Rochniak (@grem1in) - SRE @ Preply && Maksym Vlasov (@MaxymVlasov) - Engineer @ Star. Opinions on our own.

We do not post ads including event announcements. Please, do not bother us with such requests!
Download Telegram
​​For today’s Donations Monday, I’d like to share with you a fundraiser for FPV drones from DeepState - a collective behind the close-to-real-time battlefield maps.

https://send.monobank.ua/jar/9AtiB8esqu

#donations #Ukraine
2
More follow-ups for the AWS outage (Azure outage didn't generate that much press).

Lorin Hochstein analyzes the postmortem from the complexity point of view and comes to quite interesting conclusions that you can absolutely apply to your incidents and postmortems as well.

tl;dr is that incidents (especially bigger ones) are often unique. So, when reasoning about the preventive measures, you need not only to prevent similar incidents, but also get prepared to handle incidents in general, because the next incident may be not the same as the present one.

#reliability #sre #aws
5👍1
An article by Charity Majors on why thinking of Observability in pillars is limiting.

I recall a similar article from the past about how Facebook does their observability. It’s somewhere here on the channel.

The core idea is to treat all the signals as universal wide events that would allow one to preserve all the context and not hop between different tools.

#observability
👍8🤯1
​​For today's Donations Monday, I'd like to share with you a fundraiser for the Optic Dragons unit - a specialized FPV drone assault unit of the 92nd Separate Assault Brigade.

They're raising funds for optical fiber drones, spare parts for converting drones to fiber optics, and supporting combat vehicles of pilot crews. The unit has been redeployed to the Pokrovsk direction where the situation is intense and they need more drone reels for optical drones.

Direct donation link:
https://send.monobank.ua/jar/7D7whfQHfF

Card number: 4441 1111 2291 2961

#donations #Ukraine
4👍1
An interesting lab for an overengineered solution from AWS for Kubernetes workloads right sizing.

Should you implement it this way? I don't know. But maybe, you want to play with GitOps, AWS Bedrock and all that stuff.

Also, it's funny how they say in the beginning that having VPA and Goldilocks inside a cluster is an overhead and additional management burden and then propose to create a cluster in GHA runtime and use generative AI to address that.

#aws #kubernetes
2😁1🤔1
Press F to pay respects.

>>> Ingress NGINX Retirement: Kubernetes SIG Network and the Security Response Committee are announcing the upcoming retirement of Ingress NGINX. Best-effort maintenance will continue until March 2026. Afterward, there will be no further releases, no bugfixes, and no updates to resolve any security vulnerabilities that may be discovered. Existing deployments of Ingress NGINX will continue to function and installation artifacts will remain available.

Announcement page.

#kubernetes #nginx
🫡33😱61
​​For today's Donations Monday, I'd like to ask you to donate to the administrative needs of the "Come Back Alive" foundation.

It takes tremendous effort to run a foundation like this, and despite they can, they do not take money for the operational needs from regular donations. Thus, it's important to help them cover those needs as well!

https://savelife.in.ua/en/donate-en/#donate-fund-card-once

#donations #Ukraine
👍10
A postmortem from Cloudflare for yesterday’s outage is now available.

tl;dr:
>>>
The issue was not caused, directly or indirectly, by a cyber attack or malicious activity of any kind.Instead, it was triggered by a change to one of our database systems' permissions which caused the database to output multiple entries into a “feature file” used by our Bot Management system. That feature file, in turn, doubled in size. The larger-than-expected feature file was then propagated to all the machines that make up our network.
<<<

Another interesting thing:
>>>
Unrelated to this incident, we were and are currently migrating our customer traffic to a new version of our proxy service, internally known as FL2. Both versions were affected by the issue, although the impact observed was different.
Customers deployed on the new FL2 proxy engine, observed HTTP 5xx errors. Customers on our old proxy engine, known as FL, did not see errors, but bot scores were not generated correctly, resulting in all traffic receiving a bot score of zero. Customers that had rules deployed to block bots would have seen large numbers of false positives. Customers who were not using our bot score in their rules did not see any impact.
<<<

So, if you were not affected yesterday, you know why now.

#postmortem #cloudflare
🤔12👌1
Always Be Ready to Leave (Even If You Never Do) is not about keeping your CV up-to-date or socializing with recruiters, as it may seem from the title. It’s a short article on work habits that would keep you more efficient and, probably, happy at work; even if these habits would eventually make it easier for you to quit, if you choose to.

#culture
15🔥2
​​For today’s Donations Monday, I would like to remind you about the foundation that we’ve been partnering with for DevOps Days Ukraine for years now.

UA Responders. Their specialization is medical equipment and such.

#donations #Ukraine
4
Do you have the "What went well" section in your postmortems?

Here's an argument to have one with explanation of why this is important.

tl;dr: Because while each incident is different, there is a set of skills and behaviors that allow one to improvise under pressure to mitigate an incident. These skills and behaviors can be taught as well, and your "What went well" section is also for that.

#sre #incidents
🔥5👍2
​​For today’s Donations Monday, let’s help the foundations “Тихо” to raise money for FPV and Vampire drones.

https://send.monobank.ua/jar/WaFbzLzNK

This fundraiser was shared by a close friend of mine, so I trust it.

#donations #Ukraine
3
The bot I used for years to make posts into this channel has finally died. So, it seems like I won't be able to make neat buttons anymore :\

Yet, I have a couple of time-sensitive things for y'all:

- Cybersecurity books bundle by Packt
- Hacking book bundle by No Starch Press

Another time-sensitive topic: our friends at DOU are running their winter salary survey. More participants mean more accurate results, so jump in!

https://dou.ua/goto/rJks

#security #dou
3🎉2🤔1