Dev0ps

Forwarded from CatOps

Рекомендую статью Monitoring and Observability от Cindy Sridharan‏

Там простым языком (английким) объясняется, чем таки мониторинг отличается от "observability" и зачем кому-то нужен ещё один термин.

Написано хорошо, но мы то понимаем. Цитирую:

 I’m an engineer that can help provide monitoring to the other engineers in the organization
> Great, here’s $80k.
I’m an architect that can help provide observability for cloud-native, container-based applications
> Awesome! Here’s $300k!

#monitoring #observability

Medium

Monitoring and Observability

During lunch with a few friends in late July, the topic of observability came up. I have a talk coming up at Velocity in less than a month…

1 view14:40

Dev0ps

Forwarded from CatOps

А почитайте про Canopy систему от Facebook для сквозного (end-to-end) анализа производительности

Это как раз и есть то, что назвали бы модным словом "observability", но в статье его ни разу не встречается (хаха!) и, честно говоря, от самого слова уже начинает дёргаться глазик

#monitoring #observability

the morning paper

Canopy: an end-to-end performance tracing and analysis system

Canopy: an end-to-end performance tracing and analysis system Kaldor et al., SOSP’17 In 2014, Facebook published their work on ‘The Mystery Machine,’ describing an approach to end-to-end performanc…

1 view22:19

Dev0ps

Forwarded from CatOps

Окей, вы настроили мониторинг. У вас есть куча метрик, которые даже собраны в красивые дашборды

Куда смотреть? Надо ли будить половину команды, если вырос cpu_wio на 7% бэкэндов? А на 20%? Или мы просто будем сомтреть на valid_response_p95_rate и алерить по данной метрике?

Конечно, это всё очень индивидуально, и у разных людей разные мнения по поводу "золотых сигналов". Т.е индикаторов, что у нас сейчас всё overall good или overall bad. Почитать о разных мнениях можно тут:

https://medium.com/devopslinks/how-to-monitor-the-sre-golden-signals-1391cadc7524

В кратце о методах:

Google: Latency, Traffic, Errors, and Saturation
Brendan Gregg: Utilization, Saturation, and Errors
Tom Wilkie: Rate, Errors, and Duration

Ну а дальше уже в статье всё разжёвано детальней

#monitoring #observability

1 view10:53

Dev0ps

Forwarded from CatOps

Короткая методичка от Honeycomb.io о трейсинге

#observability

1 view06:57

Dev0ps

Forwarded from CatOps

Pinterest заопенсорсили свой агент логгирования — Singer

Судя по документации, логи предполагается писать в Кафку, но есть вот такая строка:

> Extensible design: Singer can be easily extended to support data uploading to custom destinations.

#logging #observability

Medium

Open sourcing Singer, Pinterest’s performant and reliable logging agent

Yu Yang | Software Engineer, Data Engineering

1 view14:22

Open sourcing Singer, Pinterest’s performant and reliable logging agent

Singer on GitHub

Dev0ps

Forwarded from CatOps

Framework for an Observability Maturity Model

Observability is on the minds of just about every modern dev team running a production service, and it’s time everyone spoke the same language, so we can build something greater on the top of our shared understanding. Access to observable systems is the path toward less frustration and more happiness—both for those responsible for production, and the customers they serve.

For this propose, Honeycomb was created white paper, where they sharing own vision of observability based on goals instead of tools.

P.S. Monitoring is only little part of this Framework.
P.P.S. If you earlier skiped Achieving Observability guide, it's time to read it.

#books #observability

1 view07:59

Framework for an Observability Maturity Model

Two words about 'Achieving Observability guide'

Dev0ps

Forwarded from Пятничный деплой

Observability на конкретном примере одного проекта https://dzone.com/articles/microservices-observability - очень доступно #observability #tracing #msa

DZone

Microservices Observability (Part 1)

This article outlines how to observe, trace, and monitor microservices on Java applications in an Openshift environment.

1 view08:44

Dev0ps

Forwarded from CatOps

Лонгрид для выходного дня о распределенном трейсинге от Cindy Sridharan.

В статье описано, какие возникают проблемы при построении трейсинга и как их можно принципиально решать.

#observability

Medium

Distributed Tracing — we’ve been doing it wrong

Distributed Tracing is often considered hard to deploy and it’s value proposition considered to be questionable at best. A variety of…

1 view16:25

Distributed Tracing — we’ve been doing it wrong

Dev0ps

Forwarded from CatOps

Ретроспектива трёх лет понятия "observability" от одной из основательниц движения - Charity Majors

В статье описана история возникновения понятие, почему метрики - это ещё не observability, практическую сторону вопроса и прочее.

#observability

The New Stack

Observability — A 3-Year Retrospective

A summary of the observability movement over the past three years.

1 view17:48

Observability: A 3-Year Retrospective

Dev0ps

Forwarded from CatOps

Если вам лень что-то читать по понедельникам, вот выпуск подкаста про Observability: инженер Uber рассказывает про распределенный трейсинг.

Если всё же читать вам приятней, чем слушать, вот интервью на том же Packt с Charity Majors - одной из пионеров этого понятия.

#observability

Packt Hub

Listen to Uber engineer Yuri Shkuro discuss distributed tracing and observability [Podcast] | Packt Hub

Uber engineer Yuri Shkuro talks about observability and distributed tracing on the Packt Podcast with Stacy Matthews and Richard Gall.

2 views09:09

Uber engineer Yuri Shkuro discuss distributed tracing and observability

Charity Majors discusses observability and dealing with “the coming armageddon of complexity”

Dev0ps

Forwarded from CatOps

VMWare отдаёт книгу по observability в обмен на ваши персональные данные

#books #observability

Vmware

Solving Microservices Bottlenecks with Observability | VMware Tanzu

In this eBook, we’ll show you how to enable engineering teams to deliver high SLOs by gaining complete visibility and analytics into microservices-based applications.

1 view08:37

Find and Resolve Microservices Bottlenecks Faster

Dev0ps

Forwarded from CatOps

Facebook рассказывают о своём сервисе стриминга логов - Scribe.

"Transporting petabytes per hour" - вам такой объём скорее всего не нужен, но почитать интересно

#observability

Engineering at Meta

Scribe: Transporting petabytes per hour via a distributed, buffered queueing system

Scribe is a distributed, buffered queueing system that encapsulates all the complexity behind moving service logs from point A to point B.

1 view08:42

Scribe: Transporting petabytes per hour via a distributed, buffered queueing system

Dev0ps

Forwarded from CatOps

Live stream с Monitorama, которая сейчас проходит в Балтиморе

#slides #observability

YouTube

2019 Monitorama Baltimore Live Stream Day 1

Monitorama takes place October 21-22, 2019 in the Pearlstone Theater of Baltimore Center Stage in Baltimore, MD. Our program consists of two days of single-track sessions and lightning talks designed to educate and inspire.

1 view15:26