May 28, 2024
Cardinal Founding Team
It’s 3:30 in the morning. You get a call about something being broken, and it’s affecting customers. You’re on call, but that doesn’t mean you want to be awake right now.
Your customer service report says “We are getting a lot of calls about movies not playing.” Details, as usual, are lacking. Can’t they just tell you more? Well, perhaps not, because they are handling the angry mob of late night movie watching people.
You open up your usual tools and see nothing looks unusual other than a slight increase in error rates. The problem is, those error metrics have so many dimensions that it may take hours to identify what, if anything, is in that data. Genre, Language, Video Quality, ISP, Member Plan, Client, Client Version… It will take forever to go through all those. What if it’s a combination of these?
When you started this job a month ago, you dreaded being on call for just this reason. Not only are you on the hook to try to triage this, but you need to decide who, if anyone, you call next. Want to make a good impression and all. You open up the runbooks, and after a bit of searching, find something that might fit. You follow those steps, but nope… not useful.
Where might you go next?
Chip to the rescue!
Oh yea! We just installed super-awesome tools from Cardinal. I’ll give them a try!
You fire up the UI, and highlight the region where customer support reported and where you see that unexplained bump.
And suddenly, on your screen, you see who specifically has problems. All those poor late night movie watchers in California, New York, and Oklahoma are affected by a CDN outage!
Armed with this knowledge, you fire up a ticket and report it to them, and head back to bed, knowing that your triage is complete, and what needs to be fixed.
Thanks, Chip! Ahh, if only every on-call page was so easy. Perhaps…
Introducing Cohort Analysis
High cardinality data is extremely useful but the amount of dimensions and their values quickly makes manual “data diving” difficult. One might build up dashboards showing some popular dimensions, but it’s often difficult to derive meaningful insights. What might make matters worse is that others on your team may have persevered to reduce cardinality to reduce costs. While they helped the balance sheet, they didn’t do you any favors since you happen to be missing the exact dimensions that could have given you the answer to your mystery.
What Cardinal Cohort Analysis allows you to do is highlight a region in a graph, and with techniques inspired from Principal Component Analysis, find interesting patterns buried deep in the dimensions. It can find a single outlier or correlate between multiple dimensions to as quickly as possible find root causes.
Given those details, our tool goes looking for more trouble, and can show you unusual patterns in related metrics. This can quickly help narrow down the specific impact both to your system and to customers.
Chip, our AI Head of Troubleshooting, will make clear what these findings represent. Each related metric examined adds to Chip’s explanation of what you are looking at.
In the example above, once Chip found a CDN problem in three states, she searched for other metrics in the system that may have an overlapping cohort in the same timeframe. She found the movie_play_starts metric to be completely broken in NY, CA, OK. This is important because now you know the business impact of the CDN offline error with just that one highlight!
Below is another example of this feature, where Chip was able to identify that movieIds starting with A & B are failing license validations, which in turn was affecting play errors. Note, movieId is a very high cardinality UUID like attribute with tens of thousands of distinct values. Getting to fine grained insights like these won’t be possible without
An opinionated tool that can process and effectively summarize high cardinality datasets.
Sending high cardinality data without fearing its cost implications.
Don’t hide from Cardinality. Embrace it.
Chip (and by extension Cardinal) loves high cardinality data. While costs and other restrictions may have trained you and your team to send less data to a provider, we at Cardinal welcome it. Chip thrives on high cardinality, being a cardinal herself.
While other vendors will cost you several arms and at least one leg, Cardinal aims to make storage of all those data points commodity priced. No need to pay per time series or per dimension. Store all the data.
About Cardinal
We're a team of former Netflix engineers who've spent the last decade building super-fast, dependable systems capable of handling petabytes of data. Cardinal represents the next step in our journey. We're combining tried-and-true Observability/SRE practices with cutting-edge innovations focused on cost-effectiveness and problem-solving efficiency, creating products that stand out in a crowded market.