Grafana alerts as code: Get started with Terraform and Grafana Alerting
https://grafana.com/blog/2022/09/20/grafana-alerts-as-code-get-started-with-terraform-and-grafana-alerting
https://grafana.com/blog/2022/09/20/grafana-alerts-as-code-get-started-with-terraform-and-grafana-alerting
SRE Bytes: The Four Golden Signals of Monitoring
https://medium.com/@chaoskyle/sre-bytes-the-four-golden-signals-of-monitoring-317420631db6
https://medium.com/@chaoskyle/sre-bytes-the-four-golden-signals-of-monitoring-317420631db6
An Incident Command Training Handbook
https://blog.danslimmon.com/2019/06/24/an-incident-command-training-handbook
https://blog.danslimmon.com/2019/06/24/an-incident-command-training-handbook
Effective SRE: SLO Engineering and Error Budget
https://medium.com/@info_51889/effective-sre-slo-engineering-and-error-budget-cc1ce142274b
https://medium.com/@info_51889/effective-sre-slo-engineering-and-error-budget-cc1ce142274b
pg_activity
pg_activity is a top like application for PostgreSQL server activity monitoring.https://github.com/dalibo/pg_activity
agnos
Obtain (wildcard) certificates from let's encrypt using dns-01 without the need for API access to your DNS provider.https://github.com/krtab/agnos
How to Handle Kubernetes Health Checks
https://doordash.engineering/2022/08/09/how-to-handle-kubernetes-health-checks
https://doordash.engineering/2022/08/09/how-to-handle-kubernetes-health-checks
tracetest
Tracetest is a OpenTelemetry based tool that helps you develop and test your distributed applications. It assists you in the development process by enabling you to trigger your code and see the trace as you add OTel instrumentation. It also empowers you to create trace-based tests based on the data contained in your OpenTelemetry trace. You can verify against both the triggering transactions response AND any of the information contained deep in a span in your trace.https://github.com/kubeshop/tracetest
Introducing the official ClickHouse plugin for Grafana
https://grafana.com/blog/2022/05/05/introducing-the-official-clickhouse-plugin-for-grafana
https://grafana.com/blog/2022/05/05/introducing-the-official-clickhouse-plugin-for-grafana
Observability Best Practices when running FastAPI in a Lambda
https://www.eliasbrange.dev/posts/observability-with-fastapi-aws-lambda-powertools
https://www.eliasbrange.dev/posts/observability-with-fastapi-aws-lambda-powertools
k8spacket
k8spacket - packets traffic visualization for kuberneteshttps://github.com/k8spacket/k8spacket
bindplane-op
BindPlane OP is an open source observability pipeline that gives you the ability to collect, refine, and ship metrics, logs, and traces to any destination. BindPlane OP provides the controls you need to reduce observability costs and simplify the deployment and management of telemetry agents at scale.https://github.com/observIQ/bindplane-op
6 Best Practices for Effective Readiness and Liveness Probes
https://www.datree.io/resources/kubernetes-readiness-and-liveness-probes-best-practices
https://www.datree.io/resources/kubernetes-readiness-and-liveness-probes-best-practices
Optimizing TCP for high WAN throughput while preserving low latency
https://blog.cloudflare.com/optimizing-tcp-for-high-throughput-and-low-latency
https://blog.cloudflare.com/optimizing-tcp-for-high-throughput-and-low-latency
Slowing Down to Speed Up – Circuit Breakers for Slack’s CI/CD
How Slack increased developer productivity and prevented cascading internal failures by implementing orchestration-level circuit breakershttps://slack.engineering/circuit-breakers
gprofiler
gProfiler is a system-wide profiler, combining multiple sampling profilers to produce unified visualization of what your CPU is spending time on.https://github.com/Granulate/gprofiler
jc
CLI tool and python library that converts the output of popular command-line tools, file-types, and common strings to JSON, YAML, or Dictionaries. This allows piping of output to tools like jq and simplifying automation scripts.https://github.com/kellyjonbrazil/jc