Presented by

  • David Bell

    David Bell

    David is a DevOps engineer who has long been interested abandoning intuition and "gut feel" for solid data to better answer the question "are my production systems healthy?" (spoilers: they probably aren't) and helping teams answer the age old "why does it do that in prod? it doesn't do that on my machine!"

Abstract

You could almost set your watch by it: at 2pm daily the microservice would time out and crash, the database growing increasingly slow and deadlock prone, and the SLA perilously close to failing. Everything looked "normal" - the logs showed typical requests and responses right up until it all fell over, the metrics showed the API received more requests at other times of the day so it wasn't overwhelmed and had capacity, but **something was different**. Was it a noisy neighbour problem on the shared database? Something malicious not caught by the WAF? Solar flares? What was going on?! Join us on a journey into the unknown-unknowns with our guide O11y (pronounced "Ollie", short for "Observability") as we explore: - Observability and its "three pillars" - OpenTelemetry Tracing - Auto- and Manual-Instrumentation - High Cardinality, High Dimensionality, and Sampling - Honeycomb.io's querying and trace rendering